program evd(all, evdp, display, sites, genomes, evfeatures, output); (* evolution display Dr. Thomas D. Schneider, Ph.D. National Institutes of Health schneidt@mail.nih.gov toms@alum.mit.edu (permanent) http://alum.mit.edu/www/toms (permanent) module libraries required: delman, prgmods, delmods, ev *) label 1; (* end of program *) const (* begin module version *) version = 2.53; (* of evd.p 2010 Sep 24 2010 Sep 24, 2.53: showsites was never called!! 2010 Sep 24, 2.52: upgrade using ev.p modules 2010 Feb 03, 2.51: cleanup 2010 Feb 03, 2.50: upgrade from ev was needed 2010 Feb 03, 2.49: Bus error in showcreatures 2005 Mar 23, 2.45: upgrade for GPC 2.43, 2002 Apr 4: compute thermodynamic information of matrix 2.42, 2002 Apr 4: make gpc compatable: variable x local to dononsites 2.41, 2002 Mar 28: remove namelength const and alpha type which were not used 2.38, 2000 Jul 18: invent ev.run script 2.37, 2000 Jul 17: add Schneider.oxyr ref. 2.35, 2000 Jul 16: upgrade documentation 2.34, 2000 Jul 16: upgrade documentation 2.31, 2000 Jul 9: upgrade documentation for publication release 2.30, 1999 Jul 26: need to evaluate if selecting was off for proper display 2.27, 1999 Jul 23: (+++) now have parens between bases 2.23, 1999 Jul 22: tidy up; fix location of threshold in evfeatures 2.21, 1999 Feb 18: upgrade to gaussian noise 2.20, 1999 Feb 17: upgrade to more modern time modules, solve y2k problem 1998 Mar 7 upgrade to allow random placement of sites 1988 oct 27 previous changes 1984 apr 19 origin *) updateversion = 2.26; (* defines lowest acceptable current parameter file *) (* end module version *) (* begin module describe.evd *) (* name evd: evolution display synopsis evd(all: in, evdp: in, display: out, sites: out, genomes: out, evfeatures: out, evdp: in, output: out); files all: the all file from program ev. It contains all the genomes of the evolving creatures, and other parameters and data. It is created by a binary dump by the ev program for the sake of speed and so is not readable by people and so probably cannot be successfully run on another kind of computer. The evd program therefore is needed to interpret the all file. evdp: parameters. one per line: firstcreature: the number of the first creature to display secondcreature: the number of the last creature to display non-site features: if the first character is 'n' then non-sites that are recognized by the weight matrix are shown as features in evfeatures If evdp is empty, the defaults are: 1/1/- (first creature only, no non-site features) The creature of rank 1 makes the fewest mistakes; number 2 makes more, etc. display: a marked display of the genomes and other data. sites: raw sequences of the sites (and 5 bases around each), the sequences are separated by periods, and different creatures are separated by blank lines. The current method for using this to create a sequence logo is described in the 'see also' section. genomes: raw sequences of the genomes of the creatures, separated by periods. evfeatures: The features in this genome in the form used by the lister program. output: messages to the user. description The purpose of the ev program is to evolve sites; the purpose of evd is to display an intermediate or final result of an evolutionary run. The genomes are displayed with the locations of the recognizer gene and its sites: a-- c-- g-- t-- is the region encoding the weights for one of the recognizer fingers. The numbers underneath are the weights for each base. TTTT marks the threshold for the matrix. The number underneath is the threshold. (-------------) is a site. The number underneath is the evaluation of the site by the weight matrix. If the site is recognized (the evaluation is greater or equal to the threshold) then + signs are used. The sequences around each site are written to the file 'sites', and the entire genomes are written to the file 'genomes'. This allows analysis by other programs. EXPERIMENTAL: Thermodynamic Probabilities and Information of Weight Matrix Thermodynamic probabilities are computed by assuming that the weight matrix values represent energies, which may not be true. However, given this assumption, we can compute the corresponding probabilities in a Boltzmann distribution by first computing the partition function Q = sum_b exp(weight_b) where b is one of the four bases and weight_b is the weight at some position l for base b. Then the probability for base b is exp(weight_b)/Q. The uncertainty and information are then computed in the normal way. As a technical note, weight values are often high integers, such as 400. This will exceed the capability of the exponentiation. To avoid this, the absolute value of the weights at all positions are taken and the largest weight is used to normalize the entire matrix to the range -1 to +1. 2002 April 4: The thermodynamic computation was implemented. Notably, for the standard evp.selection at 1000 generations, where Rf = 4.0 and Rs = 4.71035+/-0.29733, the computed value is 0.98 bits. I do not yet know what this discrepancy means. However, if the normalization is set to be between -0.5 and +0.5, then the computed value increases to 3.08 bits. This may indicate that the computation is meaningless or that something else has to be done ... see also {The "Evolution of Biological Information" paper (with active hypertext links in references):} http://www-lecb.ncifcrf.gov/~toms/paper/ev/ {Example parameter file:} evdp {Program for the evolution of binding sites (this creates the all file):} ev.p {Program to display the genomes marked by the sites:} {* individual information version:} lister.p {* public version:} listerx.p {These programs require the Delila book format rather than the simple sequence in the genomes file. To convert, use the} makebk.p {program: cp genomes sequ # copy the sites file to the sequ file makebk <} makebkp {# make the delila book with} makebkp {and} makebk.p http://www.lecb.ncifcrf.gov/~toms/icons/donor.pure.gif {TO MAKE LOGOS: Briefly, after running ev to evolve binding sites, use this} evd.p {program to get the binding site sequences. Then copy the sites file to the sequ file and use the} makebk.p {program (preferably in automatic mode) to create a Delila book. Then use standard delila system programs to create the logo, as described at} http://www.lecb.ncifcrf.gov/~toms/delila.html{.} {Assuming that you have all the necessary input files, in detail the procedure is: ev # evolve the creatures with} ev.p { evd # make the display files with this program,} evd.p { cp sites sequ # copy the sites file to the sequ file (a unix command) makebk <} makebkp {# make the delila book with} makebkp {and} makebk.p { encode # be sure to use the f mode in} encodep {with} encode.p { rseq # compute information using the} rseq.p {program dalvec # make the symvec using the} dalvec.p {program makelogo # make the logo from the symvec with} makelogo.p{. Descriptions of the required input files are given in the documentation of each program and a general review is given in the paper} http://www-lecb.ncifcrf.gov/~toms/paper/oxyr/{. An example script that performs the steps above and also creates all the necessary input files is} run.ev{. {A movie of binding site evolution created using these steps is at} http://www.lecb.ncifcrf.gov/~toms/paper/ev/movie/ author Thomas Dana Schneider bugs none known (well, actually it's full of them... ;-) Evd cannot handle non-random site placement in the display file because the drawing mechanism was not designed to do so. The lister or listerx program has to be used. A warning is provided to output. technical notes *) (* end module describe.evd *) (* a handy line for indicating that the program is under construction: http://www.lecb.ncifcrf.gov/~toms/icons/construction.gif {Under Construction} *) (* begin module ev.const *) maxgenomesize = 33000; (* maximum genome size 32768 *) (* 5000 up until 2006 Jun 28 *) (* 1048576 gives segmentation fault *) (* was 1040 *) maxgamma = 128; (* maximum number of sites. this is used as the array bound on the locations of the sites *) maxbugs = 128; (* maximum number of creatures *) maxwidth = 20; (* maximum width of a recognizer *) maxbpi = 10; (* one more than the maximum bases per integer. this value determines how fine the fingers of the recognizer are able to feel *) extra = 5; (* the number of extra bases around each site printed into file sequ. *) col = 14; (* column width for the data display *) dec = 5; (* decimal places for the data display *) displayingtypes = 8; (* number of ways to display things in list *) mutatehalf = false; (* if true, only newly replicated bugs are mutated, thereby preserving good strains. If false, all bugs are mutated every generation. *) (* end module ev.const version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module evd.const *) deflinewidth = 30; (* bases per line, maximum *) (* end module evd.const *) (* begin module datetime.const *) datetimearraylength = 19; (* length of dataarray for dates, It is just long enough to include the 4 digit year - solving the year 2000 problem: 1980/06/09 18:49:11 123456789 123456789 1 2 *) (* end module datetime.const version = 4.10; (@ of ev.p 2010 Sep 24 *) type (* begin module datetime.type *) (* array for dates *) datetimearray = packed array[1..datetimearraylength] of char; (* end module datetime.type version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module ev.type *) base = (a,c,g,t); (* the symbols in the genomes of the bugs *) misc = record (* various precalculated data *) power: array[1..maxbpi, a..t] of integer; (* powers of 4. power[x,a] = 0 power[x,c] = 4 ^ x. power[x,g] = 2*power[x,c] power[x,t] = 3*power[x,c] this forms a lookup-table for quick translation *) negative: integer; (* the smallest value in the range 0..4^bpi that represents a negative value in two s complement notation *) twonegative: integer; (* twice negative. this is the actual amount to be subtracted *) endscang: integer; (* g-1; the end of a scan of the genome *) ehnb: real; (* the sampling bias factor for the number of sites, gamma *) varhnb: real; (* the variance of ehnb *) rstdev: real; (* the total standard deviation of the sampling bias, sqrt(varhnb*width) *) hg: real; (* genomic uncertainty *) eofnforhg: real; (* e(n) for Hg *) ln2: real; (* the natural log of 2 for calculating things in bits *) rfrequency: real; (* the information needed to locate sites *) evversion : real; (* capture the version number of ev *) end; (* the genome of a creature is set up as a record to allow indexing to one bug. this should be faster. *) chromosome = packed array [1..maxgenomesize] of base; genometype = record genome: chromosome; (* current composition of the genome: *) composition: array[a..t] of integer; end; (* the mistakes each creature has made. The first dimension is the rank number. The second dimension has: 1. bug identification number 2. mistakes made. This allows sorting using quicksort. *) position = 0..maxbugs; (* the position type for quicksort *) rankthings = (bugid, mistakes); { ranktype = array[1..maxbugs,bugid..mistakes] of integer; uuu } ranktype = array[1..maxbugs,rankthings] of integer; population = record (* information about the entire population *) bugs: integer; (* number of creatures *) { genome: integer; (* number of potential binding sites *) } Genome: integer; (* number of potential binding sites. 2005 Jun 15. This was formerly called 'genome' but that is not a very good name. 'G' is worse though because it conflicts with 'g' in the genome. So use 'Genome'. *) gamma: integer; (* number of sites *) randomsites: char; (* r: random site placement with overlap n: random site placement, no overlap a: array placement *) width: integer; (* width of sites in bases *) bpi: integer; (* bases per integer in recognizer gene wmatrix *) bpthreshold: integer; (* bases per threshold integer *) genomesize: integer; (* the total length of the genome, calculated as g+width-1+extra *) site: array[1..maxgamma] of integer; (* locations of the sites in the genomes. The first base of a site that starts at base x is at base x+1. See function recognize. *) creature: array[1..maxbugs] of genometype; (* all the bugs *) rank: ranktype; (* the rankings of the bugs by their mistakes *) (* this array records the locations of the sites to make testing faster *) sitelocations: packed array [1..maxgenomesize] of boolean; halfofbugs: integer; (* half of the bugs, (bugs div 2) *) firstmutated: integer; (* the first bug mutated. This is either 1 or halfofbugs+1 *) end; everything = record precalc: misc; (* lots of precalculated stuff *) datetimestamp: datetimearray; (* the date and time used to identify the moment of creation on all output *) mu: integer; (* mutation rate in hits per creature per generation for mutype = g and hits per region per creature for mutype = b. *) mutype: char; (* type of mutation: 'g': mutations per genome per generation 'b': mutations per base per generation *) mudecimal: real; (* additional mutations - the decimal part of mu when mutype = b *) {mmm} seed: real; (* random number generator seed *) initialseed: real; (* the seed read from the evp *) p: population; generation: integer; (* cycle number *) cycles: integer; (* cycles to run *) displayinterval: integer; (* every display-th generation is shown *) (* NB: if more types are added to this list, increase the constant displayingtypes in module ev.const *) displayaverage : boolean; (* display average mistakes *) displaychange : boolean; (* display change of mistakes *) displaygenomich : boolean; (* display genomic uncertainty *) displayindividual : boolean; (* display individual mistakes *) displayorbits : boolean; (* display Rindividual orbits *) displayrsequence : boolean; (* display Rsequence *) displaystatus : boolean; (* display to output *) displayonemistake : boolean; (* display best individuals mistakes *) previousmistakes: integer; (* number of mistakes made in previous display (not in previous generation, it's only updated when the display is activated according to displayinterval *) selecting: boolean; (* If true, then the organisms are sorted by their mistakes. If false, then the organisms are randomly sorted. Normally this should be 'true'. *) storagefrequency: integer; (* frequency (generations) with which to store the everything record. *) sigma: real; (* standard deviation of noise *) noise: boolean; (* true if sigma > 0; used for speeding the code *) haveone: boolean; (* have a gaussian value; used by the gaussian procedure to switch between gaussianX and gaussianY *) gaussianX, gaussianY: real; (* gaussian values for the gaussian procedure *) specialrule: char; (* SPECIAL RULE for ties: r - random survival of one organism b - both survive (original method) s - one survives depending on the sorting algorithm. This is an arbitrary function of how the quicksort routine works and is therefore generally not a good way to make the decision. *) haltoncondition: char; (* halt on condition: - none; r Rs>=Rf; m mistakes 0; b both r and m *) end; (* the allfile allows the program to be stopped and restarted again: *) allfile = file of everything; wmatrix = record matrix: array[a..t, 1..maxwidth] of integer; (* the w matrix *) { zzz consider switching to zero based method: matrix: array[a..t, 0..maxwidth] of integer; (* the w matrix *) } threshold: integer; (* the lower threshold for function *) end; (* an array type for reals, such as f(B,L) and Ri(B,L) *) blarray = array[a..t, 1..maxwidth] of real; (* end module ev.type version = 4.10; (@ of ev.p 2010 Sep 24 *) var (* begin module evd.var *) all: allfile; (* the data read in *) evdp, (* parameters *) display, (* the data redisplayed *) sites, (* the raw sequences of the sites and surround *) genomes, (* the raw sequences of the genomes *) evfeatures: (* features for the lister program *) text; e: everything; (* all the data *) (* end module evd.var *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module writedatetime *) procedure writedatetime(var thefile: text; adatetime: datetimearray); (* expand the date and time out and print in the file *) var index: integer; (* index of datetime *) begin for index:=1 to datetimearraylength do write(thefile,adatetime[index]) end; (* end module writedatetime version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module basetochar *) function basetochar(ba:base):char; (* convert a base into a character *) begin case ba of a: basetochar:='a'; c: basetochar:='c'; g: basetochar:='g'; t: basetochar:='t'; end end; (* end module basetochar version = 4.10; (@ of ev.p 2010 Sep 24 *) function chartobase(ch:char):base; (* convert a character into a base *) begin case ch of 'a': chartobase:=a; 'c': chartobase:=c; 'g': chartobase:=g; 't': chartobase:=t end end; (* ************************************************************************ *) (* ************************************************************************ *) (* begin module ev.decode *) function decode(var e: everything; bug: integer; z: integer; basesperinteger: integer; precalcnegative, precalctwonegative: integer): integer; (* decode the genetic sequence into an integer. Starting at position z, the ; basesperinteger bases are translated into a two's complement integer. precalcnegative and precalctwonegative are precalculated values for forming the two's complement. see procedure makeprecalc. *) var val: integer; (* intermediate values of the decoding *) place: integer; (* place value of a value *) x: integer; (* location on the genome *) begin (* convert one stretch of the genome into an integer. the width of the stretch is basesperinteger. *) val := 0; x := z + basesperinteger; with e.p.creature[bug] do begin for place := 1 to basesperinteger do val := val + e.precalc.power[place,genome[x-place]] end; (* convert to two's complement notation *) if val >= precalcnegative then val := val - precalctwonegative; decode :=val end; (* end module ev.decode version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module ev.translate *) procedure translate(var e: everything; bug: integer; z: integer; var w: wmatrix); (* convert the gene of the bug at location z into its recognizer w matrix *) var x: integer; (* location on the genome *) l: integer; (* location on the recognizer *) b: base; (* base touched by a finger of recognizer at l *) begin (* writeln(list,'* bug = ',bug:1,' '); *) x := z ; (* base 1 of the gene is at position z *) with e.p.creature[bug] do begin for l := 1 to e.p.width do begin for b := a to t do begin w.matrix[b,l] := decode(e,bug,x,e.p.bpi, e.precalc.negative,e.precalc.twonegative); x := x + e.p.bpi (*;writeln(list,'* x=',x:2, ' w(',ord(b):1,',',l:2,') = ',val:4); *) end; end; (* decode the threshold from sequence just following the rest of the gene *) (* WARNING: if bpi <> bpthreshold then negative and twonegative are not correct below. See makeprecalc. *) w.threshold := decode(e,bug,x,e.p.bpthreshold, e.precalc.negative,e.precalc.twonegative); (* writeln(output,'threshold for bug ',bug:2,'=',w.threshold:3); *) end; end; (* end module ev.translate version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module ev.score *) function score(var w: wmatrix; width: integer; var creature: chromosome; x: integer; var e: everything): real; (* Evaluate the w matrix placed on the genome at position x. Note: x+1 is the first base touched by the w matrix. That is, x is the zero of the w, and the first base of the w is at x+1. Note that the score function does not imply how the recognition occurs. That is done by the recognize function. *) var realval: real; (* the value of the site evaluated by w *) integerval: integer; (* the value of the site evaluated by w *) l: integer; (* index to w matrix *) begin (* score *) {sssqqq} if e.noise then with e do begin realval := 0.0; (* *) end else begin integerval := 0; for l:=1 to width do integerval := integerval + w.matrix[creature[l+x],l]; score := integerval end; end; (* end module ev.score version = 4.10; (@ of ev.p 2010 Sep 24 *) (* the following segment of code, when placed into the score function, will print out the values used, and their sum ;for l:=1 to width do write(output,' ',w[creature[l+x],l]:3); ;writeln(output,' = ',val:5); *) (* begin module ev.recognize *) function recognize(var w: wmatrix; width: integer; var creature: chromosome; x: integer; var e: everything): boolean; (* Evaluate the w matrix placed on the genome at position x. Note: x+1 is the first base touched by the w matrix. That is, x is the zero of the w, and the first base of the w is at x+1. The test for recognition is that score value is greater than the threshold. *) begin (* *) recognize := score(w, width, creature, x, e) >= w.threshold (* *) end; (* end module ev.recognize version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module ev.evaluate *) procedure evaluate(var e: everything; bug: integer); (* evaluate the particular bug and put the number of mistakes into the rank array. *) var w: wmatrix; (* the recognizer; translation of the gene *) m: integer; (* the current number of mistakes counted *) x: integer; (* position on the genome *) recognized: boolean; (* a site was recognized at x *) begin {writeln(output,'in evaluate, bug: ',bug:1);} (* translate the recognizer gene into the w matrix *) translate(e,bug,1,w); { writeln(output,'after translation:'); showmatrix(output,w, 5); halt; } (* scan the genome *) m := 0; with e.p.creature[bug] do for x := 1 to e.precalc.endscang do begin recognized := recognize(w,e.p.width,genome,x,e); if e.p.sitelocations[x] <> recognized then m := m + 1; end; (* record the results *) e.p.rank[bug,bugid] := bug; (* identify the bug *) e.p.rank[bug,mistakes] := m {;writeln(output, 'bug ',bug:1, ' makes ',m:1,' mistakes');} end; (* end module ev.evaluate version = 4.10; (@ of ev.p 2010 Sep 24 *) (* ************************************************************************ *) (* begin module ev.listparams *) procedure listparams(var list: text; var e: everything); (* create the header of the file and list the parameters *) procedure truthof(x: boolean); (* print the truth value of x to file list*) begin if x then write(list,'* true ') else write(list,'* false') end; begin with e.p do begin writeln(list,'*'); writeln(list,'* ',bugs:col,' number of creatures'); writeln(list,'* ',Genome:col, ' number of potential binding sites, Genome'); writeln(list,'* ',gamma:col, ' number of sites per creature, gamma'); writeln(list,'* ',width:col, ' width of the recognizer in bases'); writeln(list,'* ',bpi:col, ' bases per integer of the recognizer'); writeln(list,'* ',e.mu:col, ' mutation rate in hits per creature per generation'); writeln(list,'* ',e.initialseed:col:12, ' initial seed for the random number generator (given in evp)'); writeln(list,'* ',e.cycles:col,' cycles'); writeln(list,'* ',e.displayinterval:col,' display interval'); truthof(e.displayaverage ); writeln(list,' : listing average mistakes'); truthof(e.displaychange ); writeln(list,' : listing changes of mistakes'); truthof(e.displayindividual); writeln(list,' : listing individuals'); truthof(e.displaystatus); writeln(list,' : listing status to output'); truthof(e.displayorbits); writeln(list,' : listing Rindividual orbits'); truthof(e.displayrsequence ); writeln(list,' : listing Rsequence'); truthof(e.displaygenomich ); writeln(list,' : listing genomic uncertainty'); truthof(e.displayonemistake); writeln(list,' : listing mistakes of best individual'); writeln(list,'* ',e.seed:col:12, ' current seed for the random number generator'); truthof(e.selecting); writeln(list,' : selecting by mistakes'); writeln(list,'* ',e.storagefrequency:col,' storage frequency'); writeln(list,'* ',e.sigma:col:dec,' sigma,', ' standard deviation of noise'); if e.noise then writeln(list,'* ',' ':col,' Noise is modeled') else writeln(list,'* ',' ':col,' No noise is modeled'); writeln(list,'* ',e.specialrule, ' SPECIAL RULE for ties: r: random, b: both, s: sort'); writeln(list,'* ',e.haltoncondition, ' halt on condition:', ' - none; r Rs>=Rf; m mistakes 0; b both r and m'); end end; (* end module ev.listparams version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module ev.geteverything *) procedure geteverything(var e: everything; var all: allfile); (* get everything in the all file. Note that this does not do a reset *) begin e := all^; get(all); end; (* end module ev.geteverything version = 4.10; (@ of ev.p 2010 Sep 24 *) (* begin module evd.getevdparams *) procedure getevdparams(var evdp: text; var first, last: integer; var nonsites: boolean; var e: everything); (* get the parameters: first and last creatures to display and whether to display nonsites to evfeatures *) var parameterversion: real; (* parameter version number *) begin reset(evdp); if eof(evdp) then begin first := 1; last := 1; nonsites := false; end else begin readln(evdp, parameterversion); if round(100*parameterversion) < round(100*updateversion) then begin writeln(output, 'You have an old parameter file!'); writeln(output, 'parameterversion is ', parameterversion:4:2); writeln(output, 'updateversion is ', updateversion :4:2); halt end; readln(evdp,first); readln(evdp,last); if eof(evdp) then nonsites := false else begin if evdp^ = 'n' then nonsites := true else nonsites := false; readln(evdp) end; if first > last then begin writeln(output,' first creature (=',first:1,') must be', ' less than or equal to last (=',last:1,')'); halt end; if first < 1 then begin writeln(output,' first parameter must be positive'); halt end; if last > e.p.bugs then begin writeln(output,' warning: there are only ',e.p.bugs:1,' creatures'); last := e.p.bugs end end; end; (* end module evd.getevdparams *) (* begin module evd.showandtell *) procedure showandtell(var display: text; e: everything); (* show and tell about a bunch of interesting facts *) var area: integer; (* area in bits devoted to the recognizer gene *) accuracy: integer; (* maximum number of bits of accuracy that the recognizer gene could handle *) begin with e do begin (* tell some facts about general things *) writeln(display,' from version ',e.precalc.evversion:4:2,' of ev'); writeln(display); write(display,' date-time stamp: '); writedatetime(display,datetimestamp); writeln(display); writeln(display); writeln(display,' ',generation:col,' generation'); writeln(display); with p, precalc do begin (* tell some facts about the population *) writeln(display,' rfrequency = ', rfrequency:10:5,' bits per site'); writeln(display); area := (8 *width * bpi); accuracy := round( ln(p.width * exp(bpi * ln(4))) /precalc.ln2 ); writeln(display,' bpi = bases per integer'); writeln(display,' recognizer gene area = (4 x width) x (2 x bpi) = ', area:1,' bits / recognizer gene'); writeln(display,' recognizer accuracy = log2 (width x 4^bpi) = ', accuracy:1,' bits'); writeln(display,' recognizer gene effectiveness = accuracy/area = ', (100*accuracy/area):5:1,'%'); writeln(display); writeln(display,' maximum site information = 2 x width = ', (2*width):5,' bits/site'); writeln(display,' probable site density = rfrequency/(2*width) = ', (rfrequency/(2*width)):5:3,' bits/site'); if rfrequency > 2*width then begin writeln(display); writeln(display,' WARNING: requested site density exceeds 1.0'); writeln(output,' WARNING: requested site density exceeds 1.0'); writeln(display); end; writeln(display); end end end; (* end module evd.showandtell *) (* begin module evd.showsites *) procedure showsites(var sites: text; var e: everything; first, last: integer); (* show the sequences of the sites for bugs of rank first to last *) var bug, (* index to bugs *) s, (* index to sites of a bug *) x: (* index to genome of a bug *) integer; begin rewrite(sites); with e.p do begin for bug := first to last do with creature[rank[bug,bugid]] do begin for s := 1 to gamma do begin (* writeln(output,'zero of site[',s:1,'] is at coordinate ',site[s]:1); *) {zzz the first position is site[s]+1!!!} for x := site[s]+1-extra to site[s]+width+extra do begin write(sites,basetochar(genome[x])) end; writeln(sites,'.') end; end; if first <> last then writeln(sites); (* separate sets *) end end; (* end module evd.showsites *) (* begin module evd.showgenomes *) procedure showgenomes(var genomes: text; var e: everything; first, last: integer); (* show the sequences for bugs of rank first to last of the genomes *) var bug, (* index to bugs *) x: (* index to genome of a bug *) integer; begin rewrite(genomes); with e.p do begin for bug := first to last do with creature[rank[bug,bugid]] do begin for x := 1 to genomesize do begin if (x mod (2*deflinewidth)) = 0 then writeln(genomes); write(genomes,basetochar(genome[x])); end; writeln(genomes,'.'); writeln(genomes) end end end; (* end module evd.showgenomes *) (* begin module leftjustify *) procedure leftjustify(var afile: text; number: integer; width: integer); (* print the number into afile in a field of some width, but left justified *) var numberlength: integer; (* number of characters taken up by a number *) begin (* put spaces on the right side *) (* add 1 to abs(number) to assure that ln can't bomb *) numberlength := round( 1.5 + ln(abs(number)+1)/ln(10) ); if width < numberlength (* won't fit *) then begin writeln(output, 'the number (',number:width,') cannot fit in the display', ' properly ... sorry if there is a mess'); write(afile, number: width); (* try it anyway... *) end else begin write(afile, number: numberlength); if width > numberlength then write(afile, ' ': (width-numberlength)) (* put blank padding *) end end; (* end module leftjustify *) (* begin module evd.thermo *) procedure thermo(var display: text; bug: integer; var e: everything); (* compute thermo and information values for the weight matrix of the bug *) const { nrange = 0.5; (* normalization range is to between -nrange to +nrange *) } nrange = 1.0; (* normalization range is to between -nrange to +nrange *) tolerance = 1e-10; (* tolerance for sum of probabilities *) var b: base; (* an index (with l) for the weight matrix *) boundary: real; (* the largest absolute value of the weight matrix at l *) h: real; (* uncertainty computed from p *) p: real; (* computed probablity - approximately a frequency *) psum: real; (* sum of p to check program *) l: integer; (* an index (with b) for the weight matrix *) Q: real; (* the partition function *) Rsl: real; (* information content at position l in bits *) Rs: real; (* total information content in bits *) w: wmatrix; (* the wmatrix from a bug *) begin with e.p do begin with creature[rank[bug,bugid]] do begin (* translate this bug's matrix: *) translate(e,rank[bug,bugid],1,w); writeln(display,' Thermodynamic probabilities', ' and information of weight matrix'); (* scale into a range from -1 to +1 so that exp does not croak *) boundary := 0; for l := 1 to width do begin for b := a to t do begin (* determine min and max *) if abs(w.matrix[b,l]) > boundary then boundary := abs(w.matrix[b,l]); end; end; writeln(display, ' abs(largest weight): ', round(boundary):4); writeln(display, ' normalization range: ', -nrange:4:2, ' to ', +nrange:4:2); (* adjust the normalization range *) boundary := boundary * nrange; writeln(display, ' normalization value: ', round(boundary):4); write(display,'L':4); for b := a to t do write(display,' ',basetochar(b):4); write(display,' | '); for b := a to t do write(display,' p(',basetochar(b),')'); write(display,' | '); write(display,' bits'); writeln(display); Rs := 0.0; for l := 1 to width do begin write(display,l:4); Q := 0.0; for b := a to t do begin write(display, ' ',w.matrix[b,l]:4); Q := Q + exp(w.matrix[b,l]/boundary); end; write(display, ' | '); h := 0.0; psum := 0.0; for b := a to t do begin p := exp(w.matrix[b,l]/boundary)/Q; psum := psum + p; h := h - p*ln(p)/ln(2); write(display, ' ',p:4:2); end; Rsl := 2 - h; Rs := Rs + Rsl; write(display, ' | ', Rsl:5:2); (* make sure that the probability sum is very close to 1.0 *) if (1.0 -psum) > tolerance then begin write(display, ' | probabilty sum differs from 1 by ', (1.0-psum):8); end; writeln(display) end; writeln(display, ' Thermodynamic Matrix Information = ', Rs:5:2, ' bits'); end; end; {zzzQQQ} writeln(display); end; (* end module evd.thermo *) (* begin module evd.showcreatures *) procedure showcreatures(var display: text; var e: everything; first, last: integer); (* show to the display the creatures of rank first to last *) type sitetype = (nonsite, begsite, midsite, endsite); (* the beginning, middle and end of a site are indicated *) var b: base; (* an index (with l) that rotates through the bases for printing the wmatrix values *) bug: integer; (* index to the bugs *) linewidth: integer; (* the number of bases per line *) l: integer; (* an index (with b) that rotates through the bases for printing the wmatrix values *) lsitestate: sitetype; (* like sthisline: the sitestate at the beginning of the line. *) padding: boolean; (* true means that the site finger or the site was split between lines. In this case that some padding blanks have to be put at the beginning of the next line, to keep things straight. *) sitecharacter: char; (* the character to print to mark a site: '+' means the site is recognized, '-' means it is not recognized *) sitestate: sitetype; (* state of printing a site *) s: integer; (* which site we are at *) sthisline: integer; (* the site we are looking at at the start of the current line. this way s can be reset for printing the site evaluations, after being changed for printing the (--) marks *) squared: real; (* the sum of squares of the weight matrix values *) thescore: real; (* current score for a site by using the wmatrix *) twobpi: integer; (* twice the bpi, calculated once for speed *) twowidth: integer; (* twice the width of sites, calculated once for speed *) w: wmatrix; (* the wmatrix from a bug *) wbound: integer; (* the upper boundary of the recognizer gene *) x,y: integer; (* position in a genome *) yupper: integer; (* the upper value to plot (never above g) *) begin writeln(display,' the creatures that rank from ',first:1, ' to ',last: 1,' are displayed'); writeln(display); linewidth := deflinewidth; with e.p do begin if (linewidth mod 4) <> 0 then begin (* adjust linewidth to put sites in evenly *) linewidth := linewidth - (linewidth mod 4) end; twobpi := 2*bpi; twowidth := 2*width; for bug := first to last do with creature[rank[bug,bugid]] do begin writeln(display,' creature of rank ',bug:1,' makes ', e.p.rank[bug,mistakes]:1,' mistakes'); if rank[bug,bugid] = 1 then writeln(display,' this creature makes the fewest', ' mistakes of all'); writeln(display); (* translate this bug's matrix: *) translate(e,rank[bug,bugid],1,w); {qqqyyy} writeln(display,' the w matrix is:'); write(display,'L':4); for b := a to t do write(display,' ',basetochar(b):4); writeln(display); squared := 0.0; for l := 1 to width do begin write(display,l:4); for b := a to t do begin write(display, ' ',w.matrix[b,l]:4); squared := sqr(w.matrix[b,l]) + squared; end; writeln(display) end; writeln(display,' threshold = ',w.threshold:1); writeln(display); writeln(display,' matrix magnitude = ',sqrt(squared):10:5); writeln(display,' sigma = ',e.sigma:10:5); writeln(display,' width = ',width:10); writeln(display,' 4 * width * sigma = ',4*width*e.sigma:10:5); writeln(display); {zzzQQQ} thermo(display, bug, e); x := 1; wbound := width * bpi * 4; sitestate := nonsite; s := 1; b := a; l := 1; while x < genomesize do begin (* the range of the plot is x to yupper *) yupper := x + linewidth - 1; if yupper > genomesize then yupper := genomesize; (* display sequence *) write(display,' '); for y := x to yupper do begin write(display,' ',basetochar(genome[y])) end; writeln(display,' ',yupper:1); (* display recognizer gene and the sites *) write(display,' '); sthisline := s; lsitestate := sitestate; for y := x to yupper do begin if y = site[s]+1 then begin sitestate := begsite; if recognize(w,width,genome,site[s],e) then sitecharacter := '+' else sitecharacter := '-' end else if (y >= site[s]+width) and (sitestate <> nonsite) then begin (* the > makes sure we print even if its a mess... *) sitestate := endsite; if s < gamma then s := s + 1 end; if y <= wbound then begin (* display the recognizer gene *) if ((y-1) mod bpi) = 0 (* if at start of a finger... *) then begin case ((y-1) div bpi) mod 4 of 0: write(display,'|a'); 1: write(display,' c'); 2: write(display,' g'); 3: write(display,' t'); end end else write(display,'--') end else if y <= wbound+bpthreshold then begin (* display threshold *) if y = wbound+1 then write(display,'|T') else write(display,'TT'); end else begin (* display sites *) case sitestate of nonsite: write(display,' '); begsite: begin write(display,' ('); sitestate := midsite end; midsite: write(display,sitecharacter,sitecharacter); endsite: begin write(display,sitecharacter,')'); sitestate := nonsite end; end end end; writeln(display); (* display w matrix components and site evaluations *) write(display,' '); s := sthisline; sitestate := lsitestate; for y := x to yupper do begin if y = site[s]+1 then sitestate := begsite else if (y >= site[s]+width) and (sitestate <> nonsite) then begin (* the > makes sure we print even if its a mess... *) sitestate := endsite; if s < gamma then s := s + 1 (* point to next site *) end; if y <= wbound then begin (* display w matrix components *) if ((y-1) mod bpi) = 0 (* if at start of a finger... *) then begin if b = a then write(display,'|') else write(display,' '); (* see explanation of padding below *) leftjustify(display,w.matrix[b,l],(twobpi-1)); if y > yupper - bpi then padding := true else padding := false; if b <> t then b := succ(b) (* rotate to next base *) else begin l := succ(l); (* rotate to next position *) b := a; end end else if padding then write(display,' '); if y = wbound then padding := false (* reset for sites *) end else if (y = wbound+1) then begin (* display threshold *) write(display,'|'); leftjustify(display,w.threshold,(2*bpthreshold-1)); end else if (y > wbound + bpthreshold) then begin (* display site scores, once beyond the threshold... *) (* The number is always put at the beginning of the site. This requres the spaces to be put on the right. When a site is split between lines the missing spaces are padded into the beginning of the next line, using variable padding *) case sitestate of nonsite: write(display,' '); begsite: begin thescore := score(w,width,genome,site[s],e); write(display,' '); leftjustify(display,round(thescore),(twowidth-1)); if y > yupper - width then padding := true else padding := false; sitestate := midsite end; midsite: begin (* filler stuff at the begin of a line replaces material lost at the end of previous line *) if padding then write(display,' ') end; endsite: begin if padding then begin write(display,' '); (* finish off *) padding := false; (* only pad the first site *) end; sitestate := nonsite end; end end end; writeln(display); writeln(display); x := x + linewidth end end end end; (* end module evd.showcreatures *) (* begin module evd.showfeatures *) procedure showfeatures(var features: text; var e: everything; first, last: integer; nonsites: boolean); (* show the sequences of the sites for bugs of rank first to last *) var ascore: integer; (* a score of a position in the genome *) atsite: boolean; (* we are at a site *) bug: integer; (* index to bugs *) done: boolean; (* done searching for whether x is at a site *) s: integer; (* index to sites of a bug *) w: wmatrix; (* the wmatrix from a bug *) x: integer; (* index to genome of a bug *) wbound: integer; (* the upper boundary of the recognizer gene on the genome *) procedure dononsites; (* do non-sites *) var x: integer; (* index to genome of a bug (gpc demands that index be local) *) s: integer; (* index to genome of a bug (gpc demands that index be local) *) begin with e.p do with creature[rank[bug,bugid]] do begin (* ok, now add sites located that are NOT on the list *) writeln(features,'* Sites not at correct places (nonsite features)'); for x := 0 to e.precalc.endscang do begin (* search through the sites to see if this is one of them *) s := 1; done := false; atsite := false; while not done do begin if site[s]= x then begin atsite := true; done := true; end; if s = gamma then done := true else s := succ(s) end; if not atsite then begin ascore := round(score(w,width,genome,x,e)); if ascore > w.threshold then begin write(features,'@ p',bug:1,' ',x+1:1, ' +1 "EvSite'); write(features,'!'); writeln(features,'" "',x:1,' evaluation: ', ascore:1, '"'); end end end; writeln(features); end; end; (* dononsites *) begin rewrite(features); with e.p do begin writeln(features,'* evd ',version:4:2, ' evdfeatures: features for the lister program'); writeln(features,'define "EvSite+" "+" "()" "()" -0.5 ',(width-1+0.5):1:1); writeln(features,'define "EvSite-" "-" "()" "()" -0.5 ',(width-1+0.5):1:1); writeln(features,'define "EvSite!" "!" "()" "()" -0.5 ',(width-1+0.5):1:1); writeln(features,'define "Threshold" "-" "--" "--" -0 ',(bpi-1):1); writeln(features,'define "A" "-" "--" "--" 0 ',(bpi-1):1); writeln(features,'define "C" "-" "--" "--" 0 ',(bpi-1):1); writeln(features,'define "G" "-" "--" "--" 0 ',(bpi-1):1); writeln(features,'define "T" "-" "--" "--" 0 ',(bpi-1):1); for bug := first to last do with creature[rank[bug,bugid]] do begin (* translate this bug's matrix: *) translate(e,rank[bug,bugid],1,w); (* write weight matrix *) for x := 0 to width-1 do begin writeln(features,'@ p',bug:1,' ',(4*bpi*x+0*bpi+1):1,' +1 "A" "', w.matrix[a,x+1]:1 ,'"'); writeln(features,'@ p',bug:1,' ',(4*bpi*x+1*bpi+1):1,' +1 "C" "', w.matrix[c,x+1]:1 ,'"'); writeln(features,'@ p',bug:1,' ',(4*bpi*x+2*bpi+1):1,' +1 "G" "', w.matrix[g,x+1]:1 ,'"'); writeln(features,'@ p',bug:1,' ',(4*bpi*x+3*bpi+1):1,' +1 "T" "', w.matrix[t,x+1]:1 ,'"'); end; (* write threshold *) wbound := width * bpi * 4; writeln(features,'@ p',bug:1,' ',(wbound+1):1,' +1 "Threshold" "', w.threshold:1,'"'); (* write the sites *) for s := 1 to gamma do begin ascore := round(score(w,width,genome,site[s],e)); {zzz NOTE LOCATION OF SITE IS AT +1 OF WEIGHT MATRIX!! } write(features,'@ p',bug:1,' ',site[s]+1:1, ' +1 "EvSite'); if ascore < w.threshold then write(features,'-') else write(features,'+'); writeln(features,'" "',s:1,' evaluation: ', ascore:1, '"'); end; writeln(features); if nonsites then dononsites; end end end; (* end module evd.showfeatures *) (* begin module evd.themain *) procedure themain(var all: allfile; var evdp, display, evfeatures, sites: text); (* the main procedure of evd *) var first, last: integer; (* what creatures to display *) nonsites: boolean; (* whether to display non-sites in evfeatures *) begin writeln(output,' evd ',version:4:2); rewrite(display); writeln(display,' evd ',version:4:2,' evolution of sites display'); (* obtain the all data *) reset(all); if eof(all) then begin writeln(output,' all file is empty'); halt end; geteverything(e, all); (* grab the data into e *) if e.p.randomsites <> 'a' then begin writeln(output,'***************************************************'); writeln(output,'* WARNING: evd does not know how to display sites *'); writeln(output,'* when they are not in a regular array! *'); writeln(output,'* The display is not reliable. Use lister. *'); writeln(output,'***************************************************'); (* check that the everything read was ok *) if (e.p.randomsites <> 'n') and (e.p.randomsites <> 'r') then begin writeln(output,'** all READ was WRONG ***************************'); writeln(output,'e.p.randomsites =', e.p.randomsites); writeln(output,'** Are modules current from ev? *****************'); halt; end end; listparams(display,e); writeln(display); getevdparams(evdp,first,last,nonsites,e); showandtell(display,e); showgenomes(genomes,e,first,last); showfeatures(evfeatures,e,first,last,nonsites); showsites(sites,e,first,last); end; (* end module evd.themain *) begin (* evd *) themain(all, evdp, display, evfeatures, sites); 1: end. (* evd *)