program rav(symvec, ravp, xyin, marks, colorkey, output); (* rav: running average information curve Tom Schneider NCI/FCRDC Bldg 469. Room 144 P.O. Box B Frederick, MD 21702-1201 (301) 846-5581 (-5532 for messages) permanent email: toms@alum.mit.edu toms@ncifcrf.gov http://www-lecb.ncifcrf.gov/~toms/ National Cancer Institute Laboratory of Experimental and Computational Biology *) label 1; (* end of program *) const (* begin module version *) version = 1.22; (* of rav.p 1998 January 27 1998 January 26: added color key origin 1998 January 21 *) updateversion = 1.20; (* defines lowest acceptable current parameter file *) (* end module version *) (* begin module describe.rav *) (* name rav: running average information curve synopsis rav(symvec: in, ravp: in, xyin: out, marks: out, colorkey: out output: out) files symvec: output of rsdata or alpro, contains an information curve. ravp: parameters to control the program. The file must contain the following parameters, one per line: parameterversion: The version number of the program. This allows the user to be warned if an old parameter file is used. minwindowsize: (integer, bases) minimum size of windows to average. maxwindowsize: (integer, bases) maximum size of windows to average. stepwindowsize: (integer, bases) how to increment window size. colorX, colorY: (real, cm) x and y coordinates of lower left corner of color key for color to bit conversion. colorWidth, colorHeight: (real, cm) x and y size of color key (respectively) colorThick: (real, cm) thickness of lines in color key strip colorTransform: (integer) Defines how the PostScript colors are to be transformed. PostScript colors run from red through red, so the highest values have the same color as the lowest ones. This parameter determines how to transform the color scale. 0: no transform 1: transform as given by hue := colorAtransform*hue + colorBtransform This gives the user full control as a linear transformation. 2: cutoff: If the hue is less than colorAtransform/2 or larger than colorBtransform/3 then hue is set to 0.3; otherwise it is zero 0 (red). This gives a red line in the region between the two parameters. 3. Transform that gives yellow up to red: hue := 0.84*hue + 0.16 4: Roy G. Biv transform: the spectrum. (red, orange, yellow, green, blue, indigo, violet) where red is highest. The transform is hue := -magic*hue + magic; with magic = 0.85 colorAtransform: multiplicative parameter for colorTransform = 2 colorBtransform: additive parameter for colorTransform = 2 xyin: output of this program for plotting by xyplo. data columns: 1: position of window center 2: window size 3: average of information in window (bits) 4: standard deviation of information (bits) 5: color hue (0 to 1) 6: color saturation (0 to 1) 7: color brightness (0 to 1) 8: xsize 9: ysize marks: output of this program for inclusion with a sequence logo created by makelogo. This is the best way to display the data! colorkey: a PostScript file that creates a key showing the relationship between bits and colors. When one runs xyin through the xyplo program the color shows the degree of (averaged) conservation; this key can be inserted into the xyout graphic by including it in your xyplom file. output: messages to the user description The program rav creates a Running AVerage information curve starting from the symvec format produced by alpro or rseq. This is an improvement over the winfo program because winfo only used the rsdata format from rseq and therefore cannot handle sequences that come from alpro (which are in symvec format). In addition, because results are reported in the marks file, this program supercedes winfo. Windows coordinates are reported at the center of the window. The marks file is set up so that the the average value is shown by a rectangle, ranging one standard deviation above and below the mean. Note that for odd window sizes the boxes are placed surrounding a base in the sequence logo, while odd ones are offset between bases. You can check that the program is working correctly by trying window sizes of 1 (which should match the heights of the logo and have boxes that match the error bars); size 2 (which should be between bases and start to show averaging) etc. examples ravp: 1.20 version of rav that this parameter file is designed for. 1 minimum window size to average. 50 maximum window size to average. 5 increment of window size. 18.3 10 colorX, colorY (cm): coordinates of lower left corner of key 0.5 8 colorWidth, colorHeight (cm): size of key 0.2 colorThick (cm): thickness of lines in color key strip 4 colorTransform 0:none,1:control,2:cutoff,3:.84hue+.16,4:-.85hue+.85 0.84 colorAtransform multiplicative color transform parameter 0.16 colorBtransform additive color transform parameter xyplop: 2 6 zerox zeroy graph coordinate center zx 0 3000 zx min max (character, real, real) if zx='x' then set xaxis zy 0 100 zy min max (character, real, real) if zy='y' then set yaxis 10 10 1 1 xinterval yinterval xsubintervals ysubintervals: axis intervals 6 4 xwidth ywidth width of numbers in characters 0 0 xdecimal ydecimal number of decimal places 16 16 xsize ysize size of axes in cm position window size a zc 'c' crosshairs, axXyYnN n 2 zxl base if zxl='l' then make x axis log to the given base n 2 zyl base if zyl='l' then make y axis log to the given base ********************************************************************* 1 2 xcolumn ycolumn columns of xyin that determine plot location 0 symbol column the xyin column to read symbols from 8 9 xscolumn yscolumn columns of xyin that determine the symbol size 5 6 7 hue saturation brightness columns for color manipulation ********************************************************************* R symbol-to-plot c(circle)bd(dotted box)x+Ifgpr(rectangle) 0 symbol-flag character in xyin that indicates that this symbol -1.0 symbol sizex side in cm on the x axis of the symbol. -1.0 symbol sizey as for the x axis, get size from yscolumn n no connection (example for connection is c- 0.125 for dashed 0.125 cm) n 0.125 linetype size linetype l.-in and size of dashes or dots ********************************************************************* . **** version 8.50 of xyplop: uses cm for distances documentation see also winfo.p, rseq.p, xyplo.p, makelogo.p, struco.p, ravp, ravxyplop author Thomas Dana Schneider bugs technical notes Constant maxwin is the largest window size allowed. When the sequence logo runs over multiple lines, the marks from rav have to be triggered AFTER the letter. If this is not done, the box appears on the previous line, and is missing from the first stack of the next line. The trick is to write the box backwards - right to left. The box is therefore triggered after the stack has been written and is always associated with the stack. This has the consequence that the box is written on top of the stack, but it is completely clean. dedicated to Pete Rogan *) (* end module describe.rav *) (* begin module rav.const *) maxwin = 2000; (* the largest window size allowed. *) infowid = 10; (* width of reported information *) infodec = 5; (* decimals for reported information *) (* end module rav.const *) type (* begin module rav.type *) infopoint = record (* information about a point in an alignment *) position: integer; (* position in the alignment *) samples: integer; (* number of sequences at this position *) information: real; (* information in bits *) variance: real; (* variance of information *) end; infoarray = record (* store some infopoints *) symbols: integer; (* number of symbols in symvec *) data: array[1..maxwin] of infopoint; (* array of infopoints *) minwindowsize: integer; (* minimum size of window requested *) maxwindowsize: integer; (* maximum size of window requested *) stepwindowsize: integer; (* step size of window requested *) windowsize: integer; (* current size of window *) halfwindowsize: real; (* half of current windowsize *) current: integer; (* current window position *) last: integer; (* the last window position *) Rtotal: real; (* current total of information R in the window *) Vtotal: real; (* current total of variance in the window *) (* color key controls *) colorX: real; (* x coordinate of lower left of color key strip *) colorY: real; (* y coordinate of lower left of color key strip *) colorWidth: real; (* horizontal width of color key strip *) colorHeight: real; (* vertical width of color key strip *) colorThick: real; (* thickness of lines in color key strip *) colorTransform: integer; (* kind of color transform to do *) colorAtransform, colorBtransform: real; (* parameters of the transform *) end; (* end module rav.type *) var symvec, (* file used by this program *) ravp, (* file used by this program *) xyin, marks, colorkey: text; (* file used by this program *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 'delmod 6.16 84 mar 12 tds/gds'; *) (* begin module docolortransform *) procedure docolortransform(colorTransform: integer; colorAtransform, colorBtransform: real; var hue: real); (* transformations of the hue, which runs from 0 to 1 into another range which gives different colors. 0: no transform 1: transform as given by a hue := colorAtransform*hue + colorBtransform 2: cutoff: hue less than colorAtransform/2 or larger than colorBtransform/3 is 0.3 otherwise 0 (red) 3. Transform that gives yellow up to red: hue := 0.84*hue + 0.16 4: Roy G. Biv transform (red, orange, yellow, green, blue, indigo, violet) where red is highest. hue := -magic*hue + magic; with magic = 0.85 *) const magic = 0.85; (* a nice cutoff for colors *) begin case colorTransform of 0: ; 1: hue := colorAtransform*hue + colorBtransform; 2: begin if hue < colorAtransform/2 then hue := 0.3 else if hue > colorBtransform/2 then hue := 0.3 else hue := 0; end; 3: hue := 0.84*hue + 0.16; 4: hue := -magic*hue + magic; end; end; (* end module docolortransform *) (* begin module makecolorkey *) procedure makecolorkey(var psout: text; colorX, colorY: real; colorWidth, colorHeight: real; colorThick: real; colorTransform: integer; colorAtransform, colorBtransform: real); (* make a color key bar with lower left hand coordinate at (colorX, colorY) of height colorHeight of width colorWidth, and with lines of thickness colorThick. The bit range is from 0 to 2 bits. *) const ticjump = 0.1; (* distance to jump to the right of the color strip to get to the tic mark, cm *) ticlength = 0.2; (* length of tic mark, cm *) var cmtopoints: real; (* conversion factor from cm to points. *) hueinteger: integer; (* used to scan the hues *) hue: real; (* a number between 0 and 1 that defines a color shade *) y: real; (* current y coordinate of color strip in points *) begin rewrite(psout); writeln(psout,'%! colorkey'); cmtopoints := 72/2.54; (* 72 points/inch * 2.54 inch/cm *) (* define a new PostScript function for color: *) writeln(psout,'/Times-Roman findfont 10 scalefont setfont'); writeln(psout,'/c {1 1 sethsbcolor} def'); writeln(psout,'% define the colors'); writeln(psout,'gsave'); writeln(psout,colorThick*cmtopoints:8:5,' setlinewidth'); for hueinteger := 0 to 100 do begin hue := hueinteger/100; (* at this point hue is in cm *) y := (colorHeight*hue + colorY)*cmtopoints; docolortransform(colorTransform,colorAtransform,colorBtransform,hue); write (psout,hue:8:5, ' c'); writeln(psout,' ', colorX*cmtopoints:8:5, ' ',y:8:5, ' moveto', ' ',colorWidth*cmtopoints:8:5,' 0 rlineto stroke'); if (hueinteger mod 5) = 0 then begin (* tic marks *) writeln(psout,'gsave'); writeln(psout,'1 setlinewidth'); writeln(psout,'1 0 0 sethsbcolor'); (* black *) writeln(psout,' ',((colorX + colorWidth + ticjump)*cmtopoints):8:5, ' ',y:8:5, ' moveto'); writeln(psout,'currentpoint translate 0 0 moveto'); writeln(psout,ticlength*cmtopoints:8:5,' 0 rlineto currentpoint stroke'); writeln(psout,'translate 0 0 moveto'); if (hueinteger mod 50) = 0 then begin (* numbers *) writeln(psout,'(',round(hueinteger/50):2); if hueinteger = 100 then writeln(psout,' bits'); writeln(psout,') show'); end; writeln(psout,'grestore'); writeln(psout,colorThick*cmtopoints:8:5,' setlinewidth'); end; end; writeln(psout,'grestore'); end; (* end module makecolorkey *) (* begin module copyaline *) procedure copyaline(var fin, fout: text); (* copy a line from file fin to file fout *) begin (* copyaline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); writeln(fout); end; (* copyaline *) (* end module copyaline *) (* begin module rav.setheaders *) procedure setheaders(var symvec, xyin, marks: text; var store: infoarray); (* read header information from symvec and transfer it to xyin and marks. Obtain number of symbols from symvec and report it too. *) begin reset(symvec); rewrite(xyin); writeln(xyin,'* rav ',version:4:2); while symvec^ = '*' do copyaline(symvec, xyin); reset(symvec); rewrite(marks); writeln(marks,'* rav ',version:4:2); while symvec^ = '*' do copyaline(symvec, marks); readln(symvec, store.symbols); writeln(xyin,'* number of symbols = ',store.symbols:1); writeln(marks,'* number of symbols = ',store.symbols:1); writeln(xyin,'* minimum window size = ',store.minwindowsize:1); writeln(xyin,'* maximum window size = ',store.maxwindowsize:1); writeln(marks,'* minimum window size = ',store.minwindowsize:1); writeln(marks,'* maximum window size = ',store.maxwindowsize:1); writeln(xyin,'*'); writeln(xyin,'* data columns:'); writeln(xyin,'* 1: position of window center'); writeln(xyin,'* 2: window size'); writeln(xyin,'* 3: average of information in window (bits)'); writeln(xyin,'* 4: standard deviation of information (bits)'); writeln(xyin,'* 5: color hue (0 to 1)'); writeln(xyin,'* 6: color saturation (0 to 1)'); writeln(xyin,'* 7: color brightness (0 to 1)'); writeln(xyin,'* 8: xsize'); writeln(xyin,'* 9: ysize'); end; (* end module rav.setheaders *) (* begin module rav.readoneposition *) procedure readoneposition(var symvec: text; var i: infopoint; symbols: integer); (* read the symvec information for one position into i, skip the symbols *) var s: integer; (* index to symbols *) procedure die; (* die at unexpected end of file *) begin writeln(output,'unexpected end of symvec file'); halt end; begin with i do begin if eof(symvec) then die; readln(symvec, position, samples, information, variance); (* skip symbol data *) for s := 1 to symbols do begin if eof(symvec) then die; readln(symvec); end end end; (* end module rav.readoneposition *) (* begin module rav.reportwindow *) procedure reportwindow(var xyin, marks: text; var store: infoarray); (* report the current window data *) var hue: real; (* a number between 0 and 1 that defines a color shade *) Rav: real; (* information average *) Vav: real; (* information variance average *) stdev: real; (* information standard deviation *) windowcenter: real; (* center of the current window *) saturation: real; (* color saturation *) brightness: real; (* color brightness *) xsize: real; (* x axis box size *) ysize: real; (* y axis box size *) begin with store do begin saturation := 1.0; brightness := 1.0; xsize := 1.0; ysize := stepwindowsize; windowcenter := data[current].position - halfwindowsize; windowcenter := windowcenter + 0.5; { writeln(output,'halfwindowsize: ', halfwindowsize:10:5); writeln(output,'data[current].position: ', data[current].position:10:5); writeln(output,'windowcenter: ', windowcenter:10:5); } Rav := Rtotal/windowsize; Vav := Vtotal/windowsize; (* unfortunately Vtotal can become 'negative zero' - protect against this stupidity *) if Vtotal <= 0.0 then stdev := 0.0 else stdev := sqrt(Vav); write(xyin, windowcenter:infowid:infodec); write(xyin, windowsize:infowid); write(xyin, ' ',Rav:infowid:infodec); write(xyin, ' ',stdev:infowid:infodec); hue := Rav/2.0; docolortransform(colorTransform,colorAtransform,colorBtransform,hue); write(xyin, ' ',hue:infowid:infodec); write(xyin, ' ',saturation:infowid:infodec); write(xyin, ' ',brightness:infowid:infodec); write(xyin, ' ',xsize:infowid:infodec); write(xyin, ' ',ysize:infowid:infodec); writeln(xyin); (* This is the point where the mark is put in backwards *) writeln(marks, 'bs ', ' ',(windowcenter+0.5):infowid:infodec, ' ',(Rav - stdev):infowid:infodec, ' ',(windowcenter-0.5):infowid:infodec, ' ',(Rav + stdev):infowid:infodec); end end; (* end module rav.reportwindow *) (* begin module rav.initialread *) procedure initialread(var symvec: text; var store: infoarray); (* read the first windowsize data points *) var w: integer; (* index to window *) begin reset(symvec); while symvec^ = '*' do readln(symvec); (* skip header *) readln(symvec); (* skip digits *) with store do begin halfwindowsize := windowsize/2; current := 0; Rtotal := 0.0; Vtotal := 0.0; for w := 1 to windowsize do begin readoneposition(symvec, data[w], symbols); current := succ(current); Rtotal := Rtotal + data[current].information; Vtotal := Vtotal + data[current].variance; end; last := 1; (* last tracks the hind end of the window *) end end; (* end module rav.initialread *) (* begin module rav.doaposition *) procedure doaposition(var symvec, xyin, marks: text; var store: infoarray); (* read a symvec position and compute the window *) begin with store do begin current := succ(current); if current > windowsize then current := 1; Rtotal := Rtotal - data[current].information; Vtotal := Vtotal - data[current].variance; readoneposition(symvec, data[current], symbols); Rtotal := Rtotal + data[current].information; Vtotal := Vtotal + data[current].variance; reportwindow(xyin, marks, store); end end; (* end module rav.doaposition *) (* begin module rav.readwindowsize *) procedure readwindowsize(var f: text; var windowsize: integer); (* read the window size from file f *) begin readln(f, windowsize); if windowsize > maxwin then begin write (output, 'windowsize requested was ',windowsize:1,'; '); writeln(output, 'windowsize must not exceed ',maxwin:1); halt end; if windowsize < 1 then begin write (output, 'windowsize requested was ',windowsize:1,'; '); writeln(output, 'windowsize must be positive'); halt end; end; (* end module rav.readwindowsize *) (* begin module rav.themain *) procedure themain(var symvec, ravp, xyin, marks, colorkey: text); (* the main procedure of the program *) var parameterversion: real; (* parameter version number *) store: infoarray; (* storage of current window *) w: integer; (* current window size *) begin writeln(output,'rav ',version:4:2); reset(ravp); readln(ravp, parameterversion); if parameterversion < updateversion then begin writeln(output, 'You have an old parameter file!'); halt end; with store do begin readwindowsize(ravp, minwindowsize); readwindowsize(ravp, maxwindowsize); readln(ravp, stepwindowsize); readln(ravp, colorX, colorY); readln(ravp, colorHeight, colorWidth); readln(ravp, colorThick); readln(ravp, colorTransform); readln(ravp, colorAtransform); readln(ravp, colorBtransform); makecolorkey(colorkey, colorX, colorY, colorHeight, colorWidth, colorThick, colorTransform, colorAtransform, colorBtransform); end; setheaders(symvec, xyin, marks, store); w := store.minwindowsize; while w <= store.maxwindowsize do begin writeln(output,'working on window size ',w:1); store.windowsize := w; initialread(symvec, store); reportwindow(xyin, marks, store); while not eof(symvec) do begin doaposition(symvec,xyin, marks, store); end; w := w + store.stepwindowsize; end; end; (* end module rav.themain *) begin themain(symvec, ravp, xyin, marks, colorkey); 1: end.