program rsim(rsimp, cmp, xyin, output); (* rsim: Rsequence simulation Thomas Schneider module libraries required: delman, matmods, prgmods, auxmods *) label 1; (* end of the program *) const (* begin module version *) version = 2.24; (* of rsim.p 1996 April 12 origin from rseq 5.31 1990 March 28 *) (* end module version *) (* begin module describe.rsim *) (* name rsim: Rsequence simulation synopsis rsim(rsimp: in, cmp: in, xyin: out, output: out) files rsimp: paramters to control the program: n: number of sequences to use to generate each fbl(simulated) rangelow, rangehigh: low and high bounds of the range of the matrix Rs: estimated value of Rsequence from the rsgra program SD: Standard Deviation of Rs based on sample size from the rsgra program. This defines the range Rslower = Rs - SD; Rsupper = Rs + SD. seed: a real number between 0 and 1 used to start the random number generator. The date and time is used if this number is greater than 1. If the number is less than 0 use whatever the system random generator initializes with. Some random generators start at the same place. To force a particular starting seed, the program will run through the random numbers until it finds one at the desired seed. Since this can take a while (seconds) the option for using seed < 0 speeds things up. It should make little difference in the results. simulations: number of fbl(true) to make Rtlower: lower limit to Rsequence(true) to work with. This allows one to remove the small ones and get on with the ones of interest. Rtupper: upper limit to Rsequence(true) to work with. selection: if the first character of the line is 's', then only those points which fall in the Rslower to Rsupper range are put into xyin. (Ie, only the 'p' values.) This allows very large crunches to be done which don't create such a large xyin file. cmp: composition file from comp program. If it is empty, the program will assume equiprobable bases. xyin: output of the program, input to the xyplo program column 1: values of R(simulated) that fall within the Rslower and Rsupper range are indicated by a 'p', others by 'n'. column 2: Rsequence(true) column 3: Rsequence(simulated) These are followed by a comment that reports the Rsequence and its standard deviation. output: messages to the user. The same comment at the end of xyin is printed to output. description Rsim stands for Rsequence-simulation. The program generates a set of Rsequence values to determine the variation of Rsequence for small sample sizes. Method. A frequency table is constructed with zero information content, namely it contains 0.25 in all positions (l) and bases (b). This table, fbltrue, is 'evolved' by altering the frequencies until it has an information content Rsequence(true) (=Rtrue) in the range Rtlower to Rtupper (with a flat distribution). A set of n sequences is generated using the fbltrue probabilities, and the information content, Rsimulated, is calculated for the set. We select out those Rsimulated values which fall within the range of the Rs+/-SD. This is repeated many times. The distribution of Rtrue values (which correspond to the selected Rsimulated values) represents the range of possible information contents of frequency tables which could have produced the observed results. In this way, we bootstrap ourselves to get the range. Note that SD is only a measure of small sample size. Use. Run an information analysis of the sites. This analysis determines n, rangelow and rangehigh for the rsimp. From the output of rseq (rsdata file), determine Rs and SD over the same range. Begin with only a few simulations. It is preferable to determine how long each simulation takes using a timing program like the UNIX /usr/5bin/time, so that the time for the final simulations can be predicted. Set Rtlower and Rtupper wide at first to be sure to capture the whole distribution. 10,000 simulations is generally sufficient for the final analysis, assuming that Rtlower and Rtupper are not too big. Graph the results with the xyplo program, using the rsim.xyplop file for parameters. The output looks like: Rsimulated | . . . | . . | . . Rs + SD | o o Rs | oo o Rs - SD | ooo | .. | . | . | . | .. ---------------------------- Rtrue ^ ^ Rtlower Rtupper The program choses a random number between Rtlower and Rtupper, Rtrue. Then it creates the fbltrue matrix with all 0.25 values. This places Rtrue at 0 initially. The matrix is evolved up to the current Rtrue value. Therefore the set of all fbltrue matricies should have a flat information content distribution. YOU MUST CHECK THAT THIS IS TRUE!! Copy the xyin file to the name 'data' and use the genhis program with these parameters: c 2 x n 30 to get a histogram of the distribution of Rtrue, coming from column 2 of the file. The distribution should be reasonably flat over the entire region of the small circles (o) above. If it is not, you must determine what is wrong before continuing. Those small circles represent the range that Rs +/- SD slices horizontally from the distribution of Rtrue versus Rsimulated. Recall that an each Rtrue leads to an fbltrue from which a single simulation of n binding sites is created; the information content of that is Rsimulated. So we want the distribution of Rtrue within the bounds of the slice. To do this, we select that slice for analysis. In UNIX, we pull out all lines from xyin which have 'p' in them (p means: "plot this"). Use: grep p xyin > data Then run genhis with these parameters: c 2 p g x n 30 Notice how well or poorly the plotted gaussian ("p g") fits your distribution. If it is a good fit you are done. You can use the standard deviation which genhis provides. Use the original Rsequence value for the mean. (The mean found on the genhis listing this way will be approximately Rsequence, but it has been created by passage through the simulation, so is not as good as the orginal data.) Alternatively, use the values from the final report at the end of the xyin file (and shown to output). NOTE: do NOT use the SEM value, use the SD value, since this simulation is to determine the standard deviation of Rsequence. examples rsimp: 1800 n: number of sequences to generate each fbl(simulated) -3 rangelow: low bound of the range of the matrix +6 rangehigh: high bound of the range of the matrix 8.036 Rs: estimated value of Rsequence 0.006 SD: variance of Rs based on sample size 0 seed: random number seed. <0,>1 => timeseed used 10000 simulations: number of fbl(true) to make 7 Rtlower: lower limit to Rsequence(true) to work with. 9 Rtupper: upper limit to Rsequence(true) to work with. no selection of values within Rs+/-SD documentation @article{Schneider1986, author = "T. D. Schneider and G. D. Stormo and L. Gold and A. Ehrenfeucht", title = "Information content of binding sites on nucleotide sequences", journal = "J. Mol. Biol.", volume = "188", pages = "415-431", year = "1986"} @article{Stephens.Schneider.Splice, author = "R. M. Stephens and T. D. Schneider", title = "Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites", journal = "J. Mol. Biol.", volume = "228", pages = "1124-1136", year = "1992"} see also rsimp, rseq.p, xyplo.p, genhis.p, rsim.xyplop, genhisp.rsim author Thomas D. Schneider bugs Does not handle di-nucleotides or longer oligos technical notes Constants maxsize (procedure calehnb) and kickover (procedure makehnblist) determine the largest n for which e(hnb) is used. Above this, ae(hnb) is used. Do not set these below 50 without careful analysis. Other constants are in module rsim.const. Although it is possible to create more than one Rsimulated from each Rtrue, this causes vertical streaks on the graph, and so will distort the simulation. It's better to get a completely clean one each time. Originally, a psudo random generator was used to create fbltrue from a random matrix (rather than 0.25) but this causes problems because such a matrix contains information and so low information points are under represented and higher ones over represented. This distorts the statistics! The program contains a portable random number generator. Unfortunately this can be 10 times slower than the non-portable one available on most systems. The procedure rnd allows one to switch between the two. When the system generator is used, one may find that the random numbers repeat exactly from one run to the next. The seed parameter would not affect the results. To avoid this problem, the random number generator is run until the requested seed is produced, within the tolerance given by the constant seedtolerance. The runs are displayed on the output. One could start with a frequency table from an rsdata file instead of equiprobable values. It would still have to be evolved. However, the evolution would have to allow for loss of information in the matrix, otherwise the distribution of Rtrue would not be flat. (It would need to work with a range rather than a bound.) This would have the advantage of making the resulting matricies "like" the original. This might give a better simuation, but it has not been tried. *) (* end module describe.rsim *) (* begin module rsim.const *) negativeinfinity = -1000000; (* the definition of negative infinity *) posfield = 4; (* field size for aligned sequence positions *) infofield = 8; (* field size for bits *) infodecim = 5; (* decimal places for bits *) nfield = 4; (* size of field for printing n, the number of sites *) seedfield = 14; (* field size for random seeds *) seeddecim = 12; (* decimal places for random seeds *) seedtolerance = 0.01; (* range away from the requested seed allowed *) (* end module rsim.const *) (* begin module vector.const *) maxvecpart = 64; (* the number of elements in a 'part' of a vector *) vpagewidth = 64; (* the width (in characters) of the output vector file from procedure writevector *) (* end module vector.const version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module datetime.const *) datetimearraylength = 19; (* length of dataarray for dates, It is just long enough to include the 4 digit year - solving the year 2000 problem: 1980/06/09 18:49:11 123456789 123456789 1 2 *) (* end module datetime.const version = 7.40; {of delmod.p 2000 Feb 18} *) type { (* begin module date.type *) (* this type is the same as the one in module book.type *) alpha = packed array[1..namelength] of char; (* end module date.type version = 2.50; (@ of ev 1988 oct 6 *) } (* begin module datetime.type *) (* array for dates *) datetimearray = packed array[1..datetimearraylength] of char; (* end module datetime.type version = 7.40; {of delmod.p 2000 Feb 18} *) (* begin module encode.type.param *) (* the following types allow the user to specify parameters which will be used to encode the sequences in the book. *) (* spaces are allowed between the encoded bases, and the number of bases to be skipped between each encoded pair of bases are kept in this linked list of integers *) spaceptr = ^spacelist; spacelist = record skips : integer; (* bases skipped to next coded base *) next : spaceptr; (* points to next spacing number *) end; (* the encoding parameters for each region are stored in these records, and the records for each region are connected into a linked list of all the encoding parameters for the entire sequences *) endpoints = (start,stop); (* end points of a coding region *) paramptr = ^parameter; parameter = record (* these are the instructions for doing the coding *) range : array[endpoints] of integer; (* the bases to be coded by these instructions, relative to the alignedbase *) window : integer; (* the number of bases included in a coding vector *) wshift : integer; (* movement of the window to the next site *) coding : integer; (* the 'level' at which the coding is being done, i.e., 1: mononucleotides; 2: dinucleotides; ... *) spaces : spaceptr; (* the spacing between the encoded bases *) cshift : integer; (* shift to the next coding unit in the window *) (* values calculated at read-in time *) wvlength: integer; (* length of a window vector. this is coding raised to the 4th power *) pvlength: integer; (* parameter vector length. the vector made up of a series of window vectors. *) (* note: pvlength/wvlength is the number of windows in the parameter range *) next : paramptr; (* set of instructions for coding another region *) end; (* end module encode.type.param version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module vector.type *) (* vectors are useful for many applications, including the encoding of sequences. this vector type is designed to be flexible enough to be used whenever one needs a vector. it is designed as a linked list so it can contain as many elements as are ever needed. the actual 'vector' is a record containing the length and a pointer to the first 'vectorpart'. that vectorpart is a record containing an array of the first 'maxvecpart' elements (maxvecpart is a constant, from module vector.const, which must be included) and a pointer to the next 'vectorpart'. the elements are of type real so that either integers or reals may be used. *) partptr = ^vectorpart; vectorpart = record numbers : array[1..maxvecpart] of real; next : partptr; end; vector = record length : integer; part : partptr; end; (* end module vector.type version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module base.type *) (* define the four nucleotide bases *) base = (a,c,g,t); (* end module base.type version = 7.40; {of delmod.p 2000 Feb 18} *) (* begin module comp.type *) (* the composition is stored in a tree of these nodes *) compnodeptr = ^compnode; (* points to a node of the tree *) compnode = record (* a node of the composition tree *) count : integer; (* the number of oligos for this node *) son : array[base] of compnodeptr; (* the pointers to 'descendants' of this node of the tree *) end; (* spiders are used to make the composition tree *) spiderptr = ^spider; (* points to a 'spider' *) spider = record (* a spider climbs the composition tree, its path determined by the sequences, and increments 'count' at all the nodes it passes, thereby determining the composition *) depth : integer; (* the level of the node now at *) place : compnodeptr; (* a pointer to the current node *) next : spiderptr; (* the next spider in the collection *) end; (* the total number of composition entries at a given level is stored in the linked list of type comptotal *) compptr = ^comptotal; comptotal = record count: integer; (* the number at a given level *) next: compptr; (* pointer to the next level totals *) end; (* 'path' is used in printing the tree *) pathptr = ^path; (* pointer into a path on the tree *) path = record (* the path of bases to get to a particular node *) bas : base; next : pathptr; end; (* end module comp.type version = 'auxmod 1.37 85 apr 4 gds/tds'; *) (* begin module hnblist.type *) (* a list of e(hnb) values for various n. by calculating these only once and saving them in this list, time can be saved. *) ehnblstptr = ^ehnblist; ehnblist = record n: integer; (* number of examples *) ehnb: real; (* e(hnb) for n *) varhnb: real; (* variance for e(hnb) for n *) aflag: char; (* 'a' if approximations used; 'e' if not *) next: ehnblstptr; (* next item on the list *) end; (* end module hnblist.type *) (* ************************************************************************ *) var (* begin module rsim.var *) rsimp, (* parameters to control the program *) cmp, (* composition *) xyin: (* output of the program, input to xyplop *) text; ln2: real; (* natural log of 2 *) Twoln2: real; (* 2*ln(2) this is 2 bits, expressed as nits *) (* end module rsim.var *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 7.40; {of delmod.p 2000 Feb 18} *) (* begin module copyaline *) procedure copyaline(var fin, fout: text); (* copy a line from file fin to file fout *) begin (* copyaline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); writeln(fout); end; (* copyaline *) (* end module copyaline version = 7.40; {of delmod.p 2000 Feb 18} *) (* begin module myrandom *) procedure myrandom(var seed: real); (* random generator 2. version = 1.00; 1986 December 31 Test this routine with the program tstrnd. written by David Masternarde (@ This random number generator is based on a shift register with a single bit of feedback, as described in Electronics for Neurobiologists, by Brown, Maxfield and Moraff, MIT press 1973, referencing Random Process Simulation and Measurement by Korn, McGraw-Hill 1966. The random seed rand, a number between 0 and 1 exclusive, is converted to an integer between 1 and 2**23-1, inclusive. This 23-bit number is shifted right one bit and the output of the last (23rd) bit and the 9th bit are added modulo 2 (exclusive orred) and fed back into the new first bit. This is done between 4 and 11 times, depending on the last 3 bits of the original number. The result is converted back to a real number between 0 and 1 from which the 23 bit integer can be recovered on the next call. The 23-bit shift register goes through all 2**23-1 values before repeating; the repetition frequency of this algorithm could be less or greater depending on the seed, because of the random number of multiple shifts per call. *) (* powers of 2 *) const pow14=16384; pow15=32768; pow22= 4194304; pow23=8388608; var iseed, (* integer shift register *) i, nrep: integer; (* index, number of times to do shift *) begin (* myrandom *) iseed := round(seed*pow23); (* convert to 23 bit number *) if (iseed<1) or (iseed>=pow23) then iseed := 1; nrep := 4 + iseed mod 8; (* do it 4 to 11 times based on last 3 bits *) for i:= 1 to nrep do (* if last bit and 9th bit are equal, feed back a 0, otherwise a 1 *) if( odd(iseed) = ((iseed mod pow15) >= pow14)) then iseed := iseed div 2 else iseed := (iseed div 2) + pow22; seed := iseed/pow23; end; (* myrandom *) (* end module myrandom version = 2.50; (@ of ev 1988 oct 6 *) procedure rnd(var seed: real); (* call to the random number generator. Either the one on the computer can be used, or a portable one can be used. *) begin seed := random(0); (* non-standard random generator *) {myrandom(seed); (* portable random generator *)} end; (* begin module getdatetime *) procedure getdatetime(var adatetime: datetimearray); (* get the date and time into a single array from the system clock. adatetime contains the date: 1980/06/09 18:49:11 ye mo da ho mi se (year, month, day, hour, minute, second). As of 2000 February 18, the Sun Pascal compiler requires a formatting statement. This statement allows the date to be generated in this standard Delila format in a single call. Information about the formatting statement is available on the manual page for date in Unix. If a computer does not have this method, see the 'oldgetdatetime' routine in delmod.p (http://www.lecb.ncifcrf.gov/~toms/delila/delmod.html) for some conversion code. *) begin date(adatetime,'%Y/%m/%d %H:%M:%S'); end; (* end module getdatetime version = 7.40; {of delmod.p 2000 Feb 18} *) (* begin module timeseed *) procedure addtoseed(var seed, power: real; c: char); (* add the digit represented by c to the seed at the power position *) var n: integer; (* the character represented by c *) begin (* addtoseed *) power := power/10; { writeln(output,'addtoseed, c = ',c); writeln(output,'addtoseed, ord(c) = ',ord(c)); } case c of ' ': begin writeln(output,'timeseed: error in datetime'); halt; end; '0': n := 0; '1': n := 1; '2': n := 2; '3': n := 3; '4': n := 4; '5': n := 5; '6': n := 6; '7': n := 7; '8': n := 8; '9': n := 9 end; (*writeln(output,'timeseed number is [',n:1,']'); (@ debug *) seed := seed + power*n end; (* addtoseed *) procedure makeseed(adatetime: datetimearray; var seed: real); (* convert adatetime to a real number in seed, reversed order *) var power: real; (* a digit of the seed such as 0.01 *) begin seed := 0.0; power := 1.0; addtoseed(seed, power, adatetime[19]); addtoseed(seed, power, adatetime[18]); (* : *) addtoseed(seed, power, adatetime[16]); addtoseed(seed, power, adatetime[15]); (* : *) addtoseed(seed, power, adatetime[13]); addtoseed(seed, power, adatetime[12]); (* *) addtoseed(seed, power, adatetime[10]); addtoseed(seed, power, adatetime[9]); (* / *) addtoseed(seed, power, adatetime[7]); addtoseed(seed, power, adatetime[6]); (* / *) addtoseed(seed, power, adatetime[4]); addtoseed(seed, power, adatetime[3]); addtoseed(seed, power, adatetime[2]); addtoseed(seed, power, adatetime[1]); end; procedure orderseed(adatetime: datetimearray; var seed: real); (* convert adatetime to a real number in seed, normal order *) var power: real; (* a digit of the seed such as 0.01 *) begin seed := 0.0; power := 1.0; addtoseed(seed, power, adatetime[1]); addtoseed(seed, power, adatetime[2]); addtoseed(seed, power, adatetime[3]); addtoseed(seed, power, adatetime[4]); (* / *) addtoseed(seed, power, adatetime[6]); addtoseed(seed, power, adatetime[7]); (* / *) addtoseed(seed, power, adatetime[9]); addtoseed(seed, power, adatetime[10]); (* *) addtoseed(seed, power, adatetime[12]); addtoseed(seed, power, adatetime[13]); (* : *) addtoseed(seed, power, adatetime[15]); addtoseed(seed, power, adatetime[16]); (* : *) addtoseed(seed, power, adatetime[18]); addtoseed(seed, power, adatetime[19]); end; procedure timeseed(var seed: real); (* read the computer date and time. reverse the order of the digits and put a decimal point in front. this gives a fraction between zero and one that varies quite quickly, and is always unique (if the computer has sufficient accuracy). it is to be used as a seed to a random number generator. *) var adatetime: datetimearray; (* a date and time *) begin (* timeseed *) getdatetime(adatetime); { writeln(output,'timeseed: adatetime: ',adatetime); } makeseed(adatetime, seed); end; (* timeseed *) (* end module timeseed version = 7.40; {of delmod.p 2000 Feb 18} *) (* ************************************************************************ *) (* could delete: *) (* begin module encode.readencpar *) (* read encode parameters *) function vectorsize(param: paramptr): integer; (* determine the size of the vector generated by a string of parameter records, begin with 'param' *) var size: integer; (* of the vector, so far *) begin size := 0; while (param <> nil) do begin size := size + param^.pvlength; param := param^.next; end; vectorsize := size; end; function paramsize(param: paramptr): integer; (* determine the size of the vector generated by one parameter record *) var rangesize, (* the number of bases in the range *) numwindows: integer; (* the number of windows which start in the range, taking into account the shifting *) begin with param^ do begin rangesize := range[stop] - range[start] + 1; numwindows := trunc((rangesize -1)/wshift + 1); paramsize := numwindows * wvlength end end; (* paramsize *) procedure readencpar(var thefile: text; var param: paramptr; var regions, vectorlength: integer); (* read the parameter list from 'thefile' and put the information into the linked parameter list, beginning with 'param'; the number of parameter records is returned in 'regions'. the length of each vector is vectorlength *) var aparam, (* from the parameter list *) newparam: paramptr; (* the new parameter record from 'thefile' *) firstspaces, (* in the spaces list *) newspaces: spaceptr; (* for the spaces list *) i,j: integer; (* indices *) ch: char; (* to check for proper positioning *) begin reset(thefile); if eof(thefile) then begin writeln(output,' encoded sequence file is empty'); halt end; new(aparam); if param = nil then new(param); aparam := param; (* skip the header *) for i := 1 to 2 do readln(thefile); (* skip a blank line in the header *) readln(thefile); (* read the number of regions *) readln(thefile,regions); for i := 1 to regions do begin with aparam^ do begin (* read the range *) read(thefile,range[start]); repeat read(thefile,ch) until (ch = 'o'); readln(thefile,range[stop]); (* read the window size and shift *) readln(thefile,window); readln(thefile,wshift); (* read the coding level and arrangement *) read(thefile,coding); new(spaces); if (coding > 1) then begin repeat read(thefile,ch) until (ch = ':'); new(firstspaces); spaces := firstspaces; read(thefile,spaces^.skips); for j := 2 to coding -1 do begin new(newspaces); spaces^.next := newspaces; spaces := newspaces; read(thefile,spaces^.skips); end; spaces^.next := nil; spaces := firstspaces end else spaces := nil; readln(thefile); readln(thefile,cshift); (* set up wvlength *) wvlength := round(exp(coding*ln(4))); end; (* set up pvlength *) aparam ^.pvlength := paramsize(aparam); if (i < regions) then begin new(newparam); aparam^.next := newparam; aparam := newparam; end; end; aparam^.next := nil; (* read the vector size *) readln(thefile,vectorlength); if vectorlength <> vectorsize(param) then begin writeln(output,' vector length in the encoded file'); writeln(output,' does not correspond to the parameters'); halt end; end; (* end module encode.readencpar version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* ************************************************************************ *) (* begin module vector.primitives *) (* these functions and procedures do manipulations on vectors. *) function vget(v:vector; pos:integer): real; (* this returns from vector 'v' the value of the element at position 'pos' *) var i: integer; begin if (pos > v.length) or (pos < 1) then begin writeln(output,' error in call to function vget:', ' position ',pos:1,' is beyond the end of the vector'); halt; end; (* move to the correct 'vectorpart' *) for i := 1 to ((pos -1) div maxvecpart) do v.part := v.part^.next; (* get the proper array element from this part *) vget := v.part^.numbers[(pos -1) mod maxvecpart + 1]; end; procedure vput(var v:vector; pos:integer; number: real); (* this puts into vector 'v' the value of 'number' at position 'pos' *) var i: integer; firstpart: partptr; (* of the vector *) begin if (pos > v.length) or (pos < 1) then begin writeln(output,' error in call to function vput:', ' position ',pos:1,' is beyond the end of the vector'); halt; end; (* move to the correct 'vectorpart' *) firstpart := v.part; for i := 1 to ((pos -1) div maxvecpart) do v.part := v.part^.next; (* put the 'number' into the proper array element for this part *) v.part^.numbers[(pos -1) mod maxvecpart + 1] := number; v.part := firstpart; end; procedure makevector(var v: vector; l: integer); (* create the linked list of vector-parts which are needed for a vector of length 'l' *) var numparts, (* number of parts needed for the vector *) i: integer; (* index to the parts of the vector *) firstpart, (* of the vector *) newpart: partptr; (* of the vector *) begin if l < 1 then begin writeln(output,' makevector: positive length required'); halt end; v.length := l; new(v.part); firstpart := v.part; numparts := (v.length - 1) div maxvecpart + 1; for i := 1 to (numparts - 1) do begin new(newpart); v.part^.next := newpart; v.part := newpart; end; v.part^.next := nil; v.part := firstpart; end; (* end module vector.primitives version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* could delete *) (* begin module vector.readvector *) procedure readvector(var thefile:text; var v: vector); (* read the elements of v into v from 'thefile'. v must already be set up (all the parts created and linked together) before calling. 'thefile' must contain, from the cursor point until the end of the vector, only numbers, either integers or reals; otherwise it will bomb. the vector ends with an end of line so that end of file can be tested for. *) var i,j: integer; (* indecies *) numparts, (* number of parts in the vector *) lastpart: integer; (* the number of elements in the last vector part *) firstpart: partptr; (* pointer to the first vector part *) begin firstpart := v.part; numparts := (v.length - 1) div maxvecpart + 1; lastpart := (v.length - 1) mod maxvecpart + 1; for i := 1 to (numparts - 1) do begin for j := 1 to maxvecpart do read(thefile,v.part^.numbers[j]); v.part := v.part^.next; end; for j := 1 to lastpart do read(thefile,v.part^.numbers[j]); readln(thefile); v.part := firstpart; end; (* end module vector.readvector version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module vector.vset *) procedure vset(thevalue: real; var v: vector); (* set the value thevalue into v *) var i: integer; (* an index *) begin for i := 1 to v.length do vput(v,i,thevalue) end; (* end module vector.vset version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module vector.vadd *) procedure vadd(a: vector; var b: vector); (* add the vector a into the vector b *) var i: integer; (* an index *) begin if a.length <> b.length then begin writeln(output,' function vadd: vector lengths must be equal'); halt end; for i := 1 to a.length do vput(b,i,vget(b,i)+vget(a,i)) end; (* end module vector.vadd version = 'matmod 1.95 85 apr 18 tds/gds'; *) (* begin module vector.vmult *) procedure vmult(k: real; var v: vector); (* multiply the vector v by the scaler k *) var i: integer; (* an index *) begin for i := 1 to v.length do vput(v,i,k*vget(v,i)) end; (* end module vector.vmult version = 'matmod 2.05 88 dec 15 tds/gds'; *) (* ************************************************************************ *) (* begin module comp.readcomp *) (* begin module skipoligo *) procedure skipoligo(var thefile: text); (* this procedure is used to skip over an oligonucleotide string. it reads until it comes to a base, and then continues to read until it comes to a blank. no checking is done for non-base characters in between. *) var ch : char; begin repeat read(thefile,ch) until (ch in ['a','c','g','t']); repeat read(thefile,ch) until (ch = ' '); end; (* end module skipoligo version = 'auxmod 1.37 85 apr 4 gds/tds'; *) procedure readcomp(var comp: text; var compmax: integer; readmax: integer; var root: compnodeptr; var monocomptotal: compptr); (* this procedure requires modules: comp.type, skipoligo, halt; this procedure reads from file 'comp' a composition and puts it into the tree pointed to by the 'root' pointer. 'compmax' is the depth of the composition tree which is stored. it is the minimum of 'readmax', the requested depth, and 'detcomp', the depth for which the input file composition was determined. 'monocomptotal' points to the beginning (for the monos) of a linked list which gives the totals for each level of the composition. *) type (* for efficient reading of the information into the tree, each node that has a successor is stored in a queue as well as put into the tree. the variables of the type list are the queue elements. *) listptr = ^list; list = record item: compnodeptr; (* points to the composition tree node *) next: listptr; (* points to the next list item in the queue *) end; var freeitem, (* the list of unused list pointers *) listitem, (* an item in the queue *) first, (* the first item in the queue *) last: listptr; (* the last item in the queue *) comptot, (* the comp total for a given level *) newcomptot: compptr; (* for adding to the string of comptot"s *) detcomp, (* the level to which the input composition was determined *) level, (* of the composition being read, i.e., monos, dis, ... *) number: integer; (* read from the 'comp' file *) ch: char; (* for reading from 'comp' *) ba: base; (* an index *) (* getitem and clearitem provide efficient use of linked list storage by keeping a list of unused pointers that can be allocated instead of always creating 'new' ones *) procedure getitem(var l: listptr); (* obtain a listitem from the free list or by making a new one *) begin if freeitem<>nil then begin l:=freeitem; freeitem:=freeitem^.next end else new(l); l^.next:=nil end; procedure clearitem(var l: listptr); (* return a listitem to the free list *) var lptr: listptr; begin if l<>nil then begin lptr:=l; l:=l^.next; lptr^.next:=freeitem; freeitem:=lptr end end; begin (* readcomp *) reset(comp); if eof(comp) then begin writeln(output,' error: no composition file provided'); halt; end; readln(comp); (* skip the program identification *) readln(comp); (* skip the book identification *) readln(comp,detcomp); (* obtain the determined composition *) readln(comp); (* skip the blank line *) (* determine the level of composition to be stored *) if (readmax < 1) then begin writeln(output); writeln(output,' warning: 0 or negative oligo length requested'); writeln(output,' composition used is depth ',detcomp:1); writeln(output); compmax := detcomp end else if (readmax > detcomp) then begin writeln(output); writeln(output,' warning: requested composition oligo', ' length (',readmax:1,')'); writeln(output,' is larger than the determined composition', ' oligo length (',detcomp:1,').'); writeln(output,' composition used is to depth ',detcomp:1); writeln(output); compmax := detcomp end else compmax := readmax; new(root); new(monocomptotal); new(comptot); new(first); new(last); first^.item := root; first^.next := nil; last := first; new(freeitem); freeitem^.next := nil; (* read in the total number of bases from the composition *) readln(comp); (* skip the * *) readln(comp); (* skip the information line *) readln(comp,number); root^.count := number; repeat read(comp,ch); (* skip the space *) read(comp,ch); (* the ch determines what to do next *) if (ch = '*') then begin (* determine the level about to be read *) readln(comp); readln(comp,level); if (level = 1) then monocomptotal := comptot else begin new(newcomptot); comptot^.next := newcomptot; comptot := newcomptot; end; comptot^.count := 0; comptot^.next := nil end else begin (* read the composition values *) for ba := a to t do begin skipoligo(comp); read(comp,number); if (number <> 0) then begin new(first^.item^.son[ba]); first^.item^.son[ba]^.count := number; comptot^.count := comptot^.count + number; getitem(listitem); last^.next := listitem; last := listitem; last^.next := nil; last^.item := first^.item^.son[ba] end else first^.item^.son[ba] := nil; end; clearitem(first); readln(comp); end; until level = compmax; (* for the last level of compositions to be read we don"t need to store the nodes in the queue because their successors will not be read in. *) repeat for ba := a to t do begin skipoligo(comp); read(comp,number); if (number <> 0) then begin new(first^.item^.son[ba]); first^.item^.son[ba]^.count := number; comptot^.count := comptot^.count + number; end else first^.item^.son[ba] := nil; end; clearitem(first); readln(comp); until (first = nil); end; (* readcomp *) function getcount(root: compnodeptr; start: pathptr): integer; (* this function follows the tree from the node 'root' (which may be the root of a subtree) along the path initiated with 'start', and returns the count of the resulting node. if the path through the tree ever hits a null node in the process the count is returned to be zero. *) var place: compnodeptr; (* a place in the composition tree *) point: pathptr; (* a point in the path through the tree *) begin point:= start; place := root; while ((place <> nil) and (point <> nil)) do begin place := place^.son[point^.bas]; point := point^.next; end; if (place = nil) then getcount := 0 else getcount := place^.count; end; (* end module comp.readcomp version = 'auxmod 1.37 85 apr 4 gds/tds'; *) (* begin module comp.getmonocomposition *) procedure getmonocomposition(var cmp: text; var gna,gnc,gng,gnt: integer; var data: text; var equalmono: boolean); (* get the mononucleotide composition in gna to gnt from the file cmp, messages about what is returned go to data. if the file cmp is empty, equal mononucleotides are assumed and equalmono is true *) var (* variables for composition procedures readcomp and getcount *) compmax: integer; root: compnodeptr; monocomptotal: compptr; start: pathptr; b1: base; (* base for comp *) count: integer; (* one of the base counts *) begin (* getmonocomposition *) reset(cmp); if eof(cmp) then begin { writeln(data,'* the composition file is empty.'); writeln(data,'* (equal mononucleotides assumed)'); } gna := 1; gnc := 1; gng := 1; gnt := 1; equalmono := true end else begin write(data,'*'); copyaline(cmp,data); write(data,'*'); copyaline(cmp,data); readcomp(cmp,compmax,1,root,monocomptotal); (* obtain the mono composition *) new(start); start^.next := nil; for b1 := a to t do begin start^.bas := b1; count := getcount(root,start); case b1 of a: gna := count; c: gnc := count; g: gng := count; t: gnt := count end end; dispose(start); equalmono := false end end; (* getmonocomposition *) (* end module comp.getmonocomposition *) (* ************************************************************************ *) (* begin module calehnb *) procedure calehnb(n: integer; gna,gnc,gng,gnt: integer; var hg, ehnb, varhnb: real (* ; debugging: boolean *)); (* calculate e(hnb) in bits/bp (ehnb) for a number (n) of example sequence sites. gna to gnt are the composition to use for the genome probabilities of a to t. the genomic entropy hg and the variance var(hnb) (=varhnb) are also calculated. if the variable debugging is passed to the procedure then the individual combinations of hnb are displayed. note: this procedure should not be broken into smaller procedures so that it remains efficient. version = 3.02; of procedure calehnb 1983 nov 23 *) const maxsize = 200; (* the largest n allowed *) accuracy = 10000; (* less than (1/accuracy) bits error is demanded for the sum of pnb (see variable 'total') at the end of the procedure *) var log2: real; (* natural log of 2, used to find log base 2 *) logn: real; (* log of n *) nlog2: real; (* n * log2 *) gn: integer; (* sum of gna..gnt *) logpa,logpc,logpg,logpt: real; (* logs of genome probabilities *) (* log of n factorial is the sum of i=1 to n of log(i). the array below represents these logs up to n *) logfact: array[0..maxsize] of real; (* precalculated values of -p*log2(p), where p=nb/n for nb = 0 .. n. m stands for minus *) mplog2p: array[0..maxsize] of real; i: integer; (* index for logfact and mplog2p *) logi: real; (* natural log of i *) na, nc, ng, nt: integer; (* numbers of bases in a site *) done: boolean; (* true when the loop is completed *) pnb: real; (* multinomial probability of a combination of na, nc, ng, nt *) hnb: real; (* entropy for a combination of na..nt *) pnbhnb: real; (* pnb*hnb, an intermediate result *) sshnb: real; (* sum of squares of hnb *) (* variables for testing program correctness: *) total: real; (* sum of pnb over all combinations of na..nt if this is not 1.00, the program is in error *) counter: integer; (* counts the number of times through the loop *) begin (* calehnb *) (* prevent access to outside the arrays: *) if n > maxsize then begin writeln(output,' procedure calehnb: n > maxsize (', n:1,'>',maxsize:1,')'); halt end; counter := 0; total := 0.0; log2 := ln(2); logn := ln(n); nlog2 := n * log2; (* get logs of genome probabilities *) gn := gna + gnc + gng + gnt; logpa := ln(gna/gn); logpc := ln(gnc/gn); logpg := ln(gng/gn); logpt := ln(gnt/gn); (* find genomic entropy *) hg := -(gna*logpa + gnc*logpc + gng*logpg + gnt*logpt)/(gn*log2); ehnb := 0.0; (* start error entropy at zero *) sshnb := 0.0; (* make table of log of n factorial up to n and entropies for nb/n *) logfact[0] := 0.0; (* factorial(0) = 0 *) mplog2p[0] := 0.0; for i := 1 to n do begin logi := ln(i); logfact[i] := logfact[i - 1] + logi; mplog2p[i] := - i * (logi - logn) / nlog2 end; (* begin by looking at the combination with all a: na = n *) na := n; nc := 0; ng := 0; nt := 0; (* the following loop simulates a number of nested loops of the form: for b1=a to t do for b2=b1 to t do for b3=b2 to t do ... for bn=b(n-1) to t do ... the resulting set of variables increase in alphabetic order since no inner loop variable can have a value less than any outer loop. the number of times through the inner-most loop is given by: o = (n + 1)*(n + 2)*(n + 3)/6 in the case where there are four symbols (a,c,g,t) and n is the number of nested loops. a recursive set of loops would be possible, but it would use up too much memory in practical cases (up to n=150 or higher). a second algorithm sequests the loop variables into an array and increments them there. however, the goal is to get all possible combinations for na, nc, ng, nt, where the sum of these is n. the nested loops provide all the combinations in alphabetic order, assuring that there can not be any duplicates. to find nb (one of na..nt) one would look at which of the variables b1 to bn were of value b. this is a wasteful operation. the loop below simulates the array of control variables by changing each nb directly. *) done := false; repeat (* pnb is calculated by taking the log of the expression fact(n) na nc ng nt pnb = ------------------- pa * pc * pg * pt . fact(na).. fact(nt) log(pnb) generates a series of sums, allowing the calculation to proceed by addition and multiplication rather than multiplication and exponentiation. the factorials become tractable in this way *) pnb := exp(logfact[n] (* n factorial *) - logfact[na] - logfact[nc] - logfact[ng] - logfact[nt] + na * logpa + nc * logpc + ng * logpg + nt * logpt); hnb := mplog2p[na] + mplog2p[nc] + mplog2p[ng] + mplog2p[nt]; pnbhnb := pnb * hnb; ehnb := ehnb + pnbhnb; sshnb := sshnb + pnbhnb * hnb; (* sum of squares of hnb *) (* the following section keeps track of the calculation and writes out the current set of nb. *) counter := succ(counter); (* if debugging then begin write(output,' ',counter:2,' '); for i := 1 to na do write(output,'a'); for i := 1 to nc do write(output,'c'); for i := 1 to ng do write(output,'g'); for i := 1 to nt do write(output,'t'); write(output,' ',na:3,nc:3,ng:3,nt:3); writeln(output,' pnb = ',pnb:10:5); end; *) total := total + pnb; (* the remaining portion of this repeat loop generates the values of na, nc, ng and nt. notice that there are 7 possibilities at each loop increment. other than the stop, in each case the sum of na+nc+ng+nt remains constant (=n). *) if nt > 0 then begin (* ending on a t - do outer loops *) if ng > 0 then begin (* turn g into t *) ng := ng - 1; nt := nt + 1 end else if nc > 0 then begin (* turn one c into g, and all t to g (note ng = 0 initially) *) nc := nc - 1; ng := nt + 1; nt := 0 end else if na > 0 then begin (* turn one a into c and all g and t to c. (note ng=nc=0 initially) *) na := na - 1; nc := nt + 1; nt := 0 end else done := true (* since nt = n *) end else begin (* no t - increment innermost loop *) if ng > 0 then begin (* turn g into t *) ng := ng - 1; nt := nt + 1 end else if nc > 0 then begin (* turn c into g *) nc := nc - 1; ng := ng + 1 end else begin (* na > 0; turn a into c *) na := na - 1; nc := nc + 1 end end until done; (* final adjustment: we only have the sum of squares so far *) varhnb := sshnb - ehnb*ehnb; (* if this message appears, there is either a bug in the code or the computer cannot be as accurate as requested *) if accuracy <> round(accuracy * total) then begin writeln(output,' procedure calehnb: the sum of probabilities is'); writeln(output,' not accurate to one part in ',accuracy:1); writeln(output,' the sum of the probabilities is ',total:10:8); end; (* if this message appear, then there is an error in the repeat-until loop: it did not repeat as many times as is expected from the algorithm *) if counter <> round((n+1)*(n+2)*(n+3)/6) then begin writeln(output,' procedure calehnb: program error, the number of'); writeln(output,' calculations is in error'); halt end; (* writeln(output, ' total: ',total:10:5); writeln(output,' count = ',counter:1); writeln(output,' (n+1)*(n+2)*(n+3)/6 = ', round((n+1)*(n+2)*(n+3)/6):1); *) end; (* calehnb *) (* end module calehnb version = 2.19; (@ of calhnb 1986 mar 31 *) (* begin module calaehnb *) procedure calaehnb(n: integer; gna,gnc,gng,gnt: integer; var hg, aehnb, avarhnb: real); (* calculate ae(hnb) in bits/bp (=aehnb) for a number (n) of example sequence sites. gna to gnt are the composition to use for the genome probabilities of a to t. the genomic entropy (=hg) and the variance avar(hnb) (=avarhnb) are also calculated. this procedure gives approximations for e(hnb) and var(hnb). it is based on a formula derived by jeff haemer. see also: g. p. basharin theory probability appl. 4(3): 333-336 (1959) 'on a statistical estimate for the entropy of a sequence of independent random variables' version = 3.01; of procedure calaehnb 1983 nov 23 *) var log2: real; (* natural log of 2 *) gn: integer; (* sum of gna..gnt *) pa, pc, pg, pt: real; (* genomic probabilities *) e: real; (* the approximate sampling error *) begin (* calaehnb *) log2 := ln(2); gn := gna + gnc + gng + gnt; pa := gna/gn; pc := gnc/gn; pg := gng/gn; pt := gnt/gn; hg := -( (pa*ln(pa)) +(pc*ln(pc)) +(pg*ln(pg)) +(pt*ln(pt)))/log2; e := 3/(2*log2*n); aehnb := hg - e; avarhnb := e * e end; (* calaehnb *) (* end module calaehnb version = 2.19; (@ of calhnb 1986 mar 31 *) (* begin module hnblist.makehnblist *) procedure makehnblist(n: vector; gna, gnc, gng, gnt: integer; var l: ehnblstptr; var hg: real); (* make a sorted list l of e(hnb) values based on the vector of samples n and the genomic composition gna to gnt. also, return hg for the genomic information *) const kickover = 50; (* e(hnb) for n values above this are estimated from ae(hnb) = hg - 3/(2 ln(2) n) while those below are calculated exactly *) var nindex: integer; (* index to n *) num: integer; (* hold the value of n at nindex to avoid multiple calls to vget *) lindex: ehnblstptr; (* index to l *) spare: ehnblstptr; (* a spare pointer for construction *) done: boolean; (* have we finished searching l? *) procedure fill(var l: ehnblstptr); (* fill l with the (global) values of n from at nindex *) begin (* fill *) if num = 0 then begin writeln(output,' do not use encodings that reach', ' beyond the sequences'); halt end; l^.n := num; if num <= kickover then begin calehnb(num, gna,gnc,gng,gnt, hg,l^.ehnb,l^.varhnb); l^.aflag := 'e' (* not using approximation *) end else begin calaehnb(num, gna,gnc,gng,gnt, hg,l^.ehnb,l^.varhnb); l^.aflag := 'a' (* using approximation *) end end; (* fill *) begin (* makehnblist *) new(l); (* start the list up *) nindex := 1; (* for the first one *) num := round(vget(n,nindex)); fill(l); l^.next := nil; for nindex := 2 to n.length do begin num := round(vget(n,nindex)); (*write(output,' nindex = ',nindex:1); (@ debug *) (*write(output,' num = ',num:1);(@debug*) if num < l^.n then begin (* insert before, to become the first on the list *) (*writeln(output,' first on list'); (@debug*) new(spare); spare^.next := l; l := spare; fill(l) end else begin (* insert somewhere in l *) lindex := l; done := false; while not done do if lindex^.next = nil then done := true else if lindex^.next^.n <= num then lindex := lindex^.next else done := true; if lindex^.n <> num then begin (*write(output,' accepted'); (@debug*) (* new number to insert, otherwise ignore *) if lindex^.next = nil then begin (* add to end *) (*writeln(output,' add to end'); (@debug*) new(lindex^.next); lindex := lindex^.next; lindex^.next := nil end else begin (* insert to middle of list *) (*writeln(output,' insert to middle'); (@debug*) new(spare); spare^.next := lindex^.next; lindex^.next := spare; lindex := spare end; fill(lindex) end (* else: reject this n since it is already on the list *) (*else writeln(output,' rejected') (@debug*) end end end; (* makehnblist *) (* end module hnblist.makehnblist *) (* begin module hnblist.gethnb *) procedure gethnb(l: ehnblstptr; n: integer; var ehnb, varhnb: real; var aflag: char); (* get the e(hnb) [ehnb] value for n example sites by looking in the precalculated list l. also return the variance of hnb, and whether these are approximations. *) var lindex: ehnblstptr; (* index to l *) begin (* gethnb *) lindex := l; while (lindex^.n <> n) and (lindex^.next <> nil) do lindex := lindex^.next; if lindex^.n <> n then begin (* oh oh, something is wrong *) writeln(output,' gethnb: program error.'); writeln(output,' unable to find n = ',n:1); halt end; ehnb := lindex^.ehnb; varhnb := lindex^.varhnb; aflag := lindex^.aflag end; (* gethnb *) (* end module hnblist.gethnb *) (* ************************************************************************ *) (* begin module rsim.getparameters *) procedure getparameters(var rsimp: text; var n: integer; var rangelow: integer; var rangehigh: integer; var Rs: real; var SD: real; var seedparameter: real; var seed: real; var simulations: integer; var Rtlower: real; var Rtupper: real; var selection: boolean); (* get the parameters from rsimp *) var s: char; (* for reading the selection value *) seeddesired: real; (* value of seed desired *) step: integer; (* step in determining seed *) begin reset(rsimp); readln(rsimp,n); readln(rsimp,rangelow); readln(rsimp,rangehigh); readln(rsimp,Rs); readln(rsimp,SD); readln(rsimp,seedparameter); readln(rsimp,simulations); readln(rsimp,Rtlower); readln(rsimp,Rtupper); readln(rsimp,s); if s = 's' then selection := true else selection := false; if seedparameter >= 0.0 then begin if seedparameter > 1.0 then begin writeln(output,'using time defined seed'); timeseed(seed); end else begin writeln(output,'using user defined seed'); seed := seedparameter; end; seeddesired := seed; (* If we are using the system random generator (which can be much faster than the portable one), then this request for seeds won't work, and (at least on the Sun 4) we get the same series. To avoid this, force the random number generator to produce one number pretty close to the desired seed. This need only be used if the system generator is in use, see procedure rnd. *) writeln(output,'Setting random seed to: ',seeddesired:seedfield:seeddecim, ' +/- ',seedtolerance:seedfield:seeddecim); step := 0; rnd(seed); while (seed < seeddesired - seedtolerance) or (seed > seeddesired + seedtolerance) do begin rnd(seed); (* this is pretty obnoxious here because it spews a lot of stuff out: writeln(output,'random seed is reset to: ',seed:seedfield:seeddecim); *) end; end else begin writeln(output,'using system defined seed'); rnd(seed); end; writeln(output,'random seed is initialized at: ',seed:seedfield:seeddecim); end; (* end module rsim.getparameters *) (* begin module rsim.writeparameters *) procedure writeparameters(var fout: text; n: integer; rangelow: integer; rangehigh: integer; Rs: real; SD: real; seedparameter: real; seed: real; simulations: integer; Rtlower: real; Rtupper: real; selection: boolean); (* write the parameters to fout *) begin rewrite(fout); writeln(fout,'*p rsim ',version:4:2, ' Rsequence SIMulations'); writeln(fout,'*p ',n:infofield,' n'); writeln(fout,'*p ',rangelow:infofield,' rangelow'); writeln(fout,'*p ',rangehigh:infofield,' rangehigh'); writeln(fout,'*p ',Rs:infofield:infodecim,' Rs '); writeln(fout,'*p ',SD:infofield:infodecim,' Rs-Standard Deviation'); writeln(fout,'*p ',seed:seedfield:seeddecim,' seed'); writeln(fout,'*p ',simulations:infofield,' n-true'); writeln(fout,'*p ',Rtlower:infofield:infodecim,' Rs-true lower bound'); writeln(fout,'*p ',Rtupper:infofield:infodecim,' Rs-true upper bound'); write (fout,'*p '); if not selection then write(fout,'no'); writeln(fout,' selection'); if seedparameter = seed then writeln(fout,'*p seed: from input (as given above)') else writeln(fout,'*p seed: from time.', ' seed = ',seed:seedfield:seeddecim); end; (* end module rsim.writeparameters *) (* ************************************************************************ *) (* begin module vector.newvector *) procedure newvector(L: integer; var v: vector); (* generate a vector v of length L *) begin makevector(v,4*L); (* make the vector length L *) vset(0.0,v); (* set the vector to zero *) end; (* end module vector.newvector *) (* begin module showvector *) procedure showvector(var fout: text; v: vector; L: integer; wid, dec: integer); (* for debugging purposes. Show to file fout the vector v, assuming that it is representing a matrix of the form L by 4. Wid is the width and dec is the number of decimal places of the numbers *) var b: integer; (* index to base B *) Lindex: integer; (* index to position L *) total: real; (* total of the base frequencies *) begin for Lindex := 0 to L-1 do begin total := 0; write(fout,' ',Lindex:posfield); for b := 1 to 4 do begin write(fout,' ',vget(v,4*Lindex+b):wid:dec); total := total + vget(v,4*Lindex+b); end; writeln(fout,' ',total:infofield:infodecim); end; end; (* end module showvector *) (* begin module vector.randfbl *) procedure randfbl(var seed: real; L: integer; var fbl: vector); (* fill the f(B,L) vector with random numbers. Use seed for the random number generator. *) var a,c,g,t: real; (* numbers of each base *) Lindex: integer; (* index to the width of the fbl vector *) total: real; (* the sum of a, c, g and t *) begin for Lindex := 0 to L-1 do begin rnd(seed); a := seed; rnd(seed); c := seed; rnd(seed); g := seed; rnd(seed); t := seed; total := a + c + g + t; vput(fbl, 4*Lindex+1, a/total); vput(fbl, 4*Lindex+2, c/total); vput(fbl, 4*Lindex+3, g/total); vput(fbl, 4*Lindex+4, t/total); (* write(output,' ',a/total:10:8); write(output,' ',c/total:10:8); write(output,' ',g/total:10:8); write(output,' ',t/total:10:8); writeln(output); *) end; end; (* end module vector.randfbl *) (* begin module vector.lnsafe *) function lnsafe(a: real): real; (* calculate ln(a). If a <= 0 return 0 so that the form p ln p handles zero cases *) begin if a > 0.0 then lnsafe := ln(a) else lnsafe := 0.0 end; (* end module vector.lnsafe *) (* begin module vector.info *) function info(fbl: vector; Lindex: integer): real; (* calculate the information at position Lindex in fbl, calculate in nits so it is faster *) var a,c,g,t: real; (* numbers of each base *) begin a := vget(fbl,4*Lindex+1); c := vget(fbl,4*Lindex+2); g := vget(fbl,4*Lindex+3); t := vget(fbl,4*Lindex+4); { writeln(xyin,'info:', ' ', a:8:6, ' ', c:8:6, ' ', g:8:6, ' ', t:8:6); } info := Twoln2 + (a*lnsafe(a) + c*lnsafe(c) + g*lnsafe(g) + t*lnsafe(t)) end; (* end module vector.info *) (* begin module vector.evfbl *) procedure evfbl(var seed: real; L: integer; var fbl: vector; Rdesired: real); (* evolve the fbl matrix to have at least Rdesired bits. No sampling error calculations need be made since this is the TRUE matrix. Use seed for the random number generator. *) const enough = 20; (* try up to this many times to improve a position *) epsilon = 1.0; (* number of bits below 2*L which is the highest we can reasonably expect the matrix to evolve up to. *) {debug = true; (* for debuggin dis proc'dure *)} var a,c,g,t: real; (* numbers of each base *) acurrent,ccurrent,gcurrent,tcurrent: real; (* current numbers of each base *) bindex: integer; (* base index: represents position of a bases *) Lindex: integer; (* index to the width of the fbl vector *) HL: real; (* uncertainty at position L in fbl *) Htry: real; (* attempt to make lower uncertainty than HL *) Ractual: real; (* the current information of the fbl table *) RL: real; (* info at position L in fbl *) total: real; (* the sum of a, c, g and t *) tries: integer; (* number of attempts to improve a position *) begin {if debug then writeln(xyin);} {if debug then writeln(xyin,'here in evfbl **********************');} (* make sure that the Rdesired can be obtained with this L! *) if Rdesired > 2*L then Rdesired := 2*L - epsilon; (* convert to natural logs for the calculations to avoid divisions *) Rdesired := Rdesired * ln2; (* precalculate the information in the matrix *) Ractual := 0.0; for Lindex := 0 to L-1 do begin Ractual := info(fbl,Lindex) + Ractual; end; {if debug then writeln(xyin,'Ractual = ',Ractual:10:8,' (nits)');} {if debug then writeln(xyin,'Rdesired = ',Rdesired:10:8,' (nits)');} {if debug then writeln(xyin,'Rdesired = ',Rdesired/ln2:10:8,' (bits)');} {if debug then writeln(xyin,'initial Ractual = ',Ractual/ln2:10:8,' (bits)');} Lindex := -1; while Ractual < Rdesired do begin Lindex := Lindex + 1; (* simple sweep across the matrix saves time *) if Lindex >= L then Lindex := 0; {if debug then writeln(xyin,'L=',L:3);} RL := info(fbl,Lindex); HL := Twoln2 - RL; {if debug then writeln(xyin,'HL =',HL:10:8);} Ractual := Ractual - RL; (* find a lower uncertainty than HL: *) acurrent := vget(fbl,4*Lindex+1); ccurrent := vget(fbl,4*Lindex+2); gcurrent := vget(fbl,4*Lindex+3); tcurrent := vget(fbl,4*Lindex+4); bindex := 1; tries := 0; if HL > 0.0 then repeat (* set bases for next try *) a := acurrent; c := ccurrent; g := gcurrent; t := tcurrent; rnd(seed); case bindex of 1: a := seed; 2: c := seed; 3: g := seed; 4: t := seed; end; total := a + c + g + t; (* renormalize *) a := a/total; c := c/total; g := g/total; t := t/total; Htry := - (a*lnsafe(a) + c*lnsafe(c) + g*lnsafe(g) + t*lnsafe(t)); bindex := succ(bindex); if bindex > 4 then bindex := 1; tries := succ(tries); {if debug then writeln(xyin,'try ',tries:2,' Htry=',Htry:10:8);} until (Htry < HL) or (tries > enough); if Htry < HL then begin (* record the victory *) {if debug then writeln(xyin,'victory!');} vput(fbl, 4*Lindex+1, a); vput(fbl, 4*Lindex+2, c); vput(fbl, 4*Lindex+3, g); vput(fbl, 4*Lindex+4, t); Ractual := Ractual + (Twoln2 - Htry); end else Ractual := Ractual + RL; (* put it back *) {if debug then writeln(xyin);} {if debug then writeln(xyin,'Ractual = ',Ractual/ln2:infofield:infodecim);} {if debug then showvector(xyin,fbl,L,infofield,infofield-2);} end; { if debug then begin writeln(xyin,' Rdesired = ',Rdesired/ln2:10:8,' (bits)'); writeln(xyin,'final Ractual = ',Ractual/ln2:10:8,' (bits)'); (* precalculate the information in the matrix *) Ractual := 0.0; for Lindex := 0 to L-1 do begin Ractual := info(fbl,Lindex) + Ractual; end; writeln(xyin,'REAL Ractual = ',Ractual/ln2:10:8,' (bits)'); writeln(xyin) end; } {if debug then writeln(xyin,'here out evfbl');} end; (* end module vector.evfbl *) (* begin module vector.crosssum *) procedure crosssum(L: integer; var v: vector); (* sum across the vector to create the ranges for choosing numbers *) var a,c,g,t: real; (* numbers of each base *) Lindex: integer; (* index to the width of the v vector *) begin for Lindex := 0 to L-1 do begin a := vget(v, 4*Lindex+1); c := vget(v, 4*Lindex+2); g := vget(v, 4*Lindex+3); t := vget(v, 4*Lindex+4); vput(v, 4*Lindex+1, a); vput(v, 4*Lindex+2, a+c); vput(v, 4*Lindex+3, a+c+g); vput(v, 4*Lindex+4, a+c+g+t); end; end; (* end module vector.crosssum *) (* begin module vector.randnbl *) procedure randnbl( L: integer; n: integer; var seed: real; fbltrue: vector; var nblsimu: vector); (* create the simulated nbl matrix nblsimu, using the fbltrue matrix as the true probabilities of bases. The width of both matricies is L. The random number seed to use is seed. *) var a,c,g,t: real; (* frequencies of each base in fbltrue *) na,nc,ng,nt: integer; (* numbers of each base for nblsimu *) realna,realnc,realng,realnt: real; (* numbers of each base for nblsimu *) Lindex: integer; (* index to the width of the fbl vector *) nindex: integer; (* index to the number of sequences inserted *) begin for Lindex := 0 to L-1 do begin (* create L positions in a sequence *) a := vget(fbltrue, 4*Lindex+1); c := vget(fbltrue, 4*Lindex+2); g := vget(fbltrue, 4*Lindex+3); t := vget(fbltrue, 4*Lindex+4); na := 0; nc := 0; ng := 0; nt := 0; for nindex := 1 to n do begin (* create n sequences *) rnd(seed); (* random function *) if (seed < a) then na := na + 1 else if (seed < c) then nc := nc + 1 else if (seed < g) then ng := ng + 1 else if (seed < t) then nt := nt + 1 end; realna := na; realnc := nc; realng := ng; realnt := nt; vput(nblsimu, Lindex* 4 + 1, realna); vput(nblsimu, Lindex* 4 + 2, realnc); vput(nblsimu, Lindex* 4 + 3, realng); vput(nblsimu, Lindex* 4 + 4, realnt); end; end; (* end module vector.randnbl *) (* this procedure is replaced by the proceeding two procedures *) (* begin module vector.sumvectors *) procedure sumvectors(var encseq: text; var nbl: vector; var firstparam: paramptr); (* scan the encoded sequences in encseq and sum them up in nbl *) var regions: integer; (* for readencpar *) vsize: integer; (* size of the vectors *) v: vector; (* a vector read in *) begin (* sumvectors *) firstparam := nil; readencpar(encseq,firstparam,regions,vsize); makevector(nbl,vsize); makevector(v,vsize); vset(0.0,nbl); while not eof(encseq) do begin readvector(encseq,v); vadd(v,nbl); end end; (* sumvectors *) (* end module vector.sumvectors *) (* begin module settheparameters *) procedure settheparameters(L: integer; var theparameters: paramptr); (* set up the encoding parameters for this particular problem *) begin new(theparameters); with theparameters^ do begin range[start] := 0; range[stop] := L-1; window := L; wshift := 1; coding := 1; spaces := nil; cshift := 1; wvlength := 4; pvlength := 1; next := nil end; end; (* end module settheparameters *) (* the entire procedure following is not needed because n is fixed *) (* begin module vector.makenl *) procedure makenl(nbl: vector; var nl: vector; var firstparam: paramptr); (* sum nbl (the number of bases (b) of each type at a position l) over the bases to create nl, the number of bases at position l *) var aparam: paramptr; (* one of the parameters *) l: integer; (* a position in the sequence *) v: integer; (* position in the vectors *) w: integer; (* position in a window of the encoding *) total: real; (* sum of encoding at l *) begin makevector(nl,nbl.length); vset(0.0,nl); aparam := firstparam; v := 0; (* handle one parameter at a time: *) while aparam <> nil do begin l := aparam^.range[start]; repeat (* add up bases at position l *) total := 0; for w := 1 to aparam^.wvlength do total := total + vget(nbl,w+v); (* put the total into the same positions of nl *) for w := 1 to aparam^.wvlength do vput(nl,w+v,total); (* move to the next slots in the vector *) v := v + aparam^.wvlength; l := l + aparam^.wshift; until l > aparam^.range[stop]; (* move to the next parameter *) aparam := aparam^.next; end; end; (* end module vector.makenl *) (* begin module rsim.prepareehnb *) procedure prepareehnb(var cmp, data: text; nl: vector; var ehnb: ehnblstptr); (* prepare the list of e(hnb) values keyed on the number of sequences, n(l). the procedure reads the composition from cmp, and announces what it did to data. *) const cmpnmin = 1000; (* minimum number of bases allowed for the composition *) var (* mononucleotide genomic composition *) gmono: array[0..3] of integer; (* a to t *) genomicn: integer; (* the number of bases of the composition *) equalmono: boolean; (* true if equal mononucleotides were assumed by getmonocomposition *) hg: real; (* the genomic entropy *) begin (* prepareehnb *) (* writeln(data,'* genomic probabilities from:'); *) getmonocomposition(cmp,gmono[0],gmono[1],gmono[2],gmono[3], data, equalmono); genomicn := gmono[0]+gmono[1]+gmono[2]+gmono[3]; if not equalmono then if genomicn < cmpnmin then begin writeln(output,' ************ WARNING ***********'); writeln(output,' there should be at least'); writeln(output,' ',cmpnmin:4,' bases of composition information'); writeln(output,' to avoid needing to adjust for sampling'); writeln(output,' error in the genomic hg.'); writeln(output,' the error with ',genomicn:5,' bases is ', 3/(2*genomicn*ln(2)):infofield:infodecim,' bits'); end; (* writeln(data,'* the genomic composition used is: ', ' a = ',gmono[0]:1, ', c = ',gmono[1]:1, ', g = ',gmono[2]:1, ', t = ',gmono[3]:1); writeln(data,'*'); *) makehnblist(nl, gmono[0],gmono[1],gmono[2],gmono[3], ehnb, hg); (* writeln(data,'* genomic information, hg: ',hg:infofield:infodecim, ' bits/base'); writeln(data,'*') *) end; (* prepareehnb *) (* end module rsim.prepareehnb *) (* begin module rsim.colinfo *) procedure colinfo(var data: text); (* send the final header column information to data *) begin (* colinfo *) writeln(data,'* l position in aligned set'); writeln(data,'* a..t number of each base at the position l'); writeln(data,'* rs(l) Rsequence(l)'); writeln(data,'* rs Rsequence, running sum of rs(l)'); writeln(data,'* var(hnb) variance of hnb'); writeln(data,'* sum-var running sum of var(hnb)'); writeln(data,'* n(l) number of sequence examples at l'); writeln(data,'* e(hnb) expectation of hnb'); writeln(data,'* f flag for calculation of e(hnb) and var(hnb):'); writeln(data,'* e = exact, a = approximation'); end; (* colinfo *) (* end module rsim.colinfo *) (* begin module rsim.colid *) procedure colid(var data: text); (* identify each column of the output *) begin (* colid *) writeln(data,'*', ' l':(posfield), (* no +1 since the * is there *) ' a':(nfield+1), ' c':(nfield+1), ' g':(nfield+1), ' t':(nfield+1), ' rs(l)':(infofield+1), ' rs':(infofield+1), ' var(hnb)':(infofield+1), ' sum-var':(infofield+1), ' n(l)':(posfield+1), ' e(hnb)':(infofield+1), ' f'); end; (* colid *) (* end module rsim.colid *) (* begin module rseq.finalcomments *) procedure finalcomments(var data: text; rs, sumvarhnb: real); (* write rsequence and its standard deviation data to data *) begin (* finalcomments *) writeln(data,'* rsequence = ',rs:infofield:infodecim, ' +/- ',sqrt(sumvarhnb):infofield:infodecim,' bits/site'); end; (* finalcomments *) (* end module rseq.finalcomments *) (* begin module rsim.makers *) procedure makers(var data: text; (* display rs(l) and other results *) { wmatrix: text; (* weight matrix for searching *) } nbl, (* number of each kind of base at each position *) nl: vector; (* number of sequences at each position *) ehnblist: ehnblstptr; (* e(hnb) for various n *) theparameters: paramptr; (* the parameters that describe the vectors *) print: boolean; var rs: real; (* the running sum of rsl *) var sumvarhnb: real); (* running sum of varhnb *) (* make rsequence. the core of the program. the vectors n(bl) [nbl] and n(l) [nl] are scanned to generate frequencies of bases. these are used to generate the w matrix w(b,l) [wbl], the average rs(l) [rsl] and then rsequence. sampling error is also recorded. printing controls whether anything is printed to the data file *) var aflag: char; (* 'a' if the approximations were used for ehnb and varhnb, 'e' if they were calculated exactly *) b: integer; (* the position in either vector corresponding to l. the b variables correspond to the base index *) bstart: integer; (* the first position in the vector for some l, used to get the number of sites *) bstop: integer; (* the last position in the vector *) ehnb: real; (* e(hnb) *) freq: real; (* the frequency of a base at a position *) l: integer; (* the aligned position in the range *) { use the global for this program --- for speed ln2: real; (* the natural log of 2. dividing the natural log of a number by ln2 gives the log to the base 2 of the number *) } rangestart: integer; (* the start of the range *) rangestop: integer; (* the stop of the range *) rsl: real; (* rsequence at a position l *) { rs: real; (* the running sum of rsl *) sumvarhnb: real; (* running sum of varhnb *) } symbols: integer; (* the number of bases *) varhnb: real; (* var(hnb) at a position l *) wbl: real; (* the weight w(b,l) *) begin (* makers *) { ln2 := ln(2); } rangestart := theparameters^.range[start]; rangestop := theparameters^.range[stop]; symbols := theparameters^.wvlength; rs := 0; sumvarhnb := 0; (* the following information is provided to allow another program to know how many lines will follow *) if print then begin writeln(data,'* ',rangestart:posfield,' ',rangestop:posfield, ' is the range'); writeln(data,'*'); colinfo(data); writeln(data,'*'); colid(data); end; for l := rangestart to rangestop do begin (* begin the display *) if print then write(data,' ',l:posfield); bstart := symbols*(l - rangestart) + 1; bstop := symbols*(l - rangestart) + symbols; gethnb(ehnblist,round(vget(nl,bstart)),ehnb,varhnb,aflag); rsl := 0; for b := bstart to bstop do begin if print then write(data,' ',round(vget(nbl,b)):nfield); freq := vget(nbl,b)/vget(nl,b); if freq > 0.0 then wbl := ehnb + ln(freq) /ln2 else wbl := negativeinfinity; { wmiddle(wmatrix,l,wbl,(((b-1) mod 4) = 0),((b mod 4) = 0)); } rsl := rsl + freq * wbl end; rs := rs + rsl; sumvarhnb := sumvarhnb + varhnb; (* display the results *) if print then writeln(data,' ',rsl:infofield:infodecim, ' ',rs:infofield:infodecim, ' ',varhnb:infofield:infodecim, ' ',sumvarhnb:infofield:infodecim, ' ',round(vget(nl,bstart)):nfield, (* the extra space looks nicer *) ' ',ehnb:infofield:infodecim, ' ',aflag); end; { finalcomments(output,rs,sumvarhnb); finalcomments(data,rs,sumvarhnb) } end; (* makers *) (* end module rsim.makers *) (* begin module rsim.themain *) procedure themain(var rsimp, cmp, xyin: text); (* the main procedure *) var (* parameters of the program *) seedparameter: real; (* a real number between 0 and 1 used to start the random number generator. The date and time is used if this number is outside 0 to 1. *) simulations: integer; (* number of Rsequence(true) values to make *) L: integer; (* width of the binding site table *) n: integer; (* number of bases in nbl table *) (* variables *) ehnbtrue: ehnblstptr; (* list of e(hnb) values *) ehnbsimu: ehnblstptr; (* list of e(hnb) values *) fbltrue: vector; (* frequencies of each kind of base at each position *) fblsimu: vector; (* like fbltrue, but based on 'sequences' *) nbltrue: vector; (* numbers of each kind of base at each position *) simindex: integer; (* index for simulations *) nblsimu: vector; (* number of each kind of base at each position *) nltrue: vector; (* total number of bases at each position, true *) nlsimu: vector; (* total number of bases at each position, simulated *) notselection: boolean; (* complement of selection variable *) rangelow: integer; (* low bound of the range of the matrix *) rangehigh: integer; (* high bound of the range of the matrix *) rstrue: real; (* the "true" value of rs *) rstn: integer; (* number of rstrue values that are selected *) rstsum: real; (* running sum of rstrue *) rstmean: real; (* mean rstrue *) rstsumsq: real; (* running sum of rstrue squared *) rststd: real; (* standard deviation of rstrue *) rstvar: real; (* variance of rstrue *) rssimu: real; (* the "simulated" value of rs *) Rs: real; (* estimated value of Rsequence from the rsgra program *) SD: real; (* Standard Deviation of Rs based on sample size *) Rslower: real; (* lower limit to Rsequence(simu) to plot. *) Rsupper: real; (* upper limit to Rsequence(simu) to plot. *) Rtlower: real; (* lower limit to Rsequence(true) to work with. This allows one to remove the small ones and get on with the ones of interest. *) Rtupper: real; (* upper limit to Rsequence(true) to work with. *) Rtrange: real; (* difference between Rtupper and Rtlower. *) Rtdemanded: real; (* Demanded value of Rtrue. *) selection: boolean; (* if 's' then only output Rstrue whose Rssimuin is in the range Rs+/-SD. *) seed: real; (* the current random number *) svhtrue: real; (* sumvarhnb: running sum of varhnb, for true *) svhsimu: real; (* sumvarhnb: running sum of varhnb, for simulated *) theparameters: paramptr; (* how the vectors are structured *) procedure blurb; (* the blurb to write out to xyin *) begin writeln(xyin,' ',rstrue:infofield:infodecim, ' ',rssimu:infofield:infodecim); end; procedure finalreport(var f: text); (* make a final report of the results to the user *) var wid: integer; (* used to increase number field *) begin wid := infofield + 3; if rstn > 2 then begin writeln(f,'*'); writeln(f,'* FINAL REPORT'); writeln(f,'*'); writeln(f,'* Rstrue values:'); writeln(f,'* n: ', rstn:wid); writeln(f,'* sum: ', rstsum:wid:infodecim); writeln(f,'* mean: ', rstmean:wid:infodecim); writeln(f,'* sum of squares: ',rstsumsq:wid:infodecim); writeln(f,'* variance: ', rstvar:wid:infodecim); writeln(f,'* standard deviation: ', rststd:wid:infodecim); writeln(f,'*'); writeln(f,'**************************************************'); writeln(f,'* original Rsequence and its standard deviation: *'); writeln(f,'* Rsequence = ', Rs:infofield:infodecim, ' +/- ', rststd:infofield:infodecim, ' bits', ' ':( 50 (* width of line *) - 14 (* '* Rsequence = ' *) - infofield (* Rsequence *) - 5 (* ' +/- ' *) - infofield (* rststd *) - 6 (* ' bits' *) ), '*'); writeln(f,'**************************************************'); end else begin writeln(f, '* WARNING: there are not enough samples (',rstn:1, ') to give a final report'); end end; begin (* themain *) writeln(output, 'rsim ',version:4:2); ln2 := ln(2); (* precalculate to make taking bits faster *) Twoln2 := 2 * ln2; (* 2 bits expressed in nits *) rstsum := 0.0; rstsumsq := 0.0; rststd := 0.0; rstvar := 0.0; getparameters(rsimp, n, rangelow, rangehigh, Rs, SD, seedparameter, seed, simulations, Rtlower, Rtupper, selection ); writeparameters(xyin, n, rangelow, rangehigh, Rs, SD, seedparameter, seed, simulations, Rtlower, Rtupper, selection ); notselection := not selection; (* for speed later *) L := rangehigh - rangelow + 1; Rslower := Rs - SD; Rsupper := Rs + SD; Rtrange := Rtupper-Rtlower; newvector(L,fbltrue); newvector(L,fblsimu); newvector(L,nbltrue); makevector(nltrue,4*L); (* make the vector length L *) newvector(L,nblsimu); makevector(nlsimu,4*L); (* make the vector length L *) (* instead of reading in the parameters, we set them ourselves: *) settheparameters(L,theparameters); (* the true frequency table has a correction based on the a large number of sequences *) vset(1e6,nltrue); (* set the vector to the number of made up sequences *) prepareehnb(cmp,output,nltrue,ehnbtrue); (* simulated vectors have a correction based on the number of sequences to be simulated, n *) vset(n,nlsimu); (* set the vector to the number of made up sequences *) prepareehnb(cmp,output,nlsimu,ehnbsimu); writeln(xyin,'* columns: (p=plot,n=no plot), Rs-true, Rs-simulated'); writeln(xyin,'*'); simindex := 0; while simindex < simulations do begin simindex := simindex + 1; (* start evolution from zero information matrix. The program started from a random matrix originally (by "randfbl(seed,L,fbltrue);" , but this causes problems because such a matrix contains information and so low information points are under represented and higher ones over represented. This distorts the statistics! *) vset(0.25,fbltrue); (* evolve the fbl matrix up to the standard *) rnd(seed); (* random function *) Rtdemanded := Rtlower + Rtrange*seed; evfbl(seed, L, fbltrue, Rtdemanded); (* create the nbltrue table from the fbltrue table *) vset(0,nbltrue); vadd(fbltrue,nbltrue); (* set nbltrue := fbltrue *) vmult(1e6,nbltrue); (* multiply up nbltrue by a nice factor *) makers(xyin,nbltrue,nltrue,ehnbtrue, theparameters,false,rstrue,svhtrue); (* make nblsimu from fbltrue *) crosssum(L,fbltrue); randnbl(L, n, seed, fbltrue, nblsimu); (* make rsequence *) makers(xyin,nblsimu,nlsimu,ehnbsimu,theparameters,false, rssimu,svhsimu); (* output to xyin *) if (rssimu>=Rslower) and (rssimu <= Rsupper) then begin write(xyin,'p'); blurb; (* now record this to compute the standard deviation *) rstn := rstn + 1; rstsum := rstsum + rstrue; rstsumsq := rstsumsq + rstrue * rstrue; end else if notselection then begin write(xyin,'n'); blurb end; (* otherwise the point is selected against *) end; (* show one of the Rs matricies *) { makers(output,nblsimu,nlsimu,ehnbsimu,theparameters,true,rssimu,svhsimu); } (* compute the rstrue standard deviation and print the results so it is easy for a user to do the right thing!! *) if rstn > 2 then begin rstmean := rstsum/rstn; rstvar := (rstsumsq/rstn) - sqr(rstmean); rststd := sqrt((rstn/(rstn-1))*rstvar); end; finalreport(xyin); finalreport(output); end; (* themain *) (* end module rsim.themain *) begin (* rsim *) themain(rsimp,cmp,xyin); 1: end. (* rsim *)