program diana (book, inst, protseq, dianap, da, output); (* diana: dinucleotide analysis of an aligned book R. Michael Stephens modified by: Dr. Thomas D. Schneider National Institutes of Health National Cancer Institute Center for Cancer Research Nanobiology Program Molecular Information Theory Group Frederick, Maryland 21702-1201 schneidt@mail.nih.gov permanent email: toms@alum.mit.edu http://www.ccrnp.ncifcrf.gov/~toms/ *) label 1; (* end of program *) const (* begin module version *) version = 2.13; (* of diana.p 2009 Apr 09 2009 Apr 09, 2.13: Rcorrelation/4 is Rnormalized, remove 17th column. 2009 Apr 09, 2.12: extend backspace 2009 Apr 09, 2.11: continued raw Rcorrelation (no error correction) 2009 Apr 09, 2.10: Give raw Rcorrelation (no error correction) 2009 Apr 09, 2.09: ignore inst file if reading book alignment!! 2009 Apr 09, 2.08: increase trimax from 20000 to 25000 2009 Apr 09, 2.07: bug: reading protseq when book exists! 2008 Jul 04, 2.06: compute with small sample correction for Rcorrelationsum 2008 Jul 03, 2.05: parse '-'; book or protseq; read fasta format 2.04, 2003 Jan 17: find and solve NaN error: addpair never adds the last row! Bug not located but division by zero protections installed. 2.03, 2003 Jan 15: find and solve NaN error 2.02, 2003 Jan 15: report 1-Rc instead of 1-Rnormalized in column 13 to properly account for small sample error!! in writeprism 2.01, 2002 Sep 5: finish upgrade 2.00, 2002 Aug 21: upgrade delmod modules: alignment 1.98, 2000 Jul 16: upgrade documentation 1.97, 1996 Oct 4: previous changes origin 1990 February 26 *) updateversion = 2.01; (* defines lowest acceptable current parameter file *) previousupdateversion = 1.88; (* previous updateversion *) versionupperbound = 10.00; (* highest version that will be accepted *) (* end module version *) (* begin module describe.diana *) (* name diana: dinucleotide analysis of an aligned book synopsis diana(book: in, inst: in, protseq: in, dianap: in, da: out, output: out); files book: the standard delila book that is to be analyzed inst: the instruction set that was used to make the book protseq: aligned sequences in the protseq format (see alpro.p). The sequences can only contain bases a, c, g, t. This file may be empty, in which case the book and inst are used. Likewise, if the book is empty, the protseq will be used. One of the two must be empty. dianap: Parameters to control the program, one per line: 0. The version of diana that this parameter file is designed for. If the program finds an old version, it will *upgrade* the dianap file. 1. fromparam and toparam (two integers), determine the from-to range over which to do the analysis. If dianap is empty, the range from book and inst is used. 2. fromprotseq and toprotseq (two integers) define the range of the protseq. They are only used if the protseq file is used. 3. dicontrol: The first character of the third line controls printing to da: 'd': just sets of 16 correlation data are printed (ie, only d in column 1 of the da file). 'i': just information about the correlation is printed (ie, only i in column 1 of the da file). 'b': both data and information printed 4. colorslope, colorconstant: two real numbers on the fourth line control the reported value of the normalized information content (column 13 below). This allows one to adjust the range of colors produced in a color plot. For PostScript colors, 0.84 and 0.16 respecitvely gives yellow ranging up to red. Once the Rnormalized has been calculated, it is replaced by the value: (Rnormalized times colorslope) added to colorconstant. 5. alignmenttype: the first character on the first line, defines how to align the pieces. See the alist program for the detailed logic. There are three choices, as in the alist program: 'f' (for 'first') then the sequences are always aligned by their first base. 'i' then the sequences are aligned by the delila instructions. If the inst file is empty, alignment is forced to the 'b' mode. 'b' (for 'internal') then the alignment is on the internal zero of the book's sequence. This option is to be used when "default coordinate zero" is used in the Delila instructions. da: the di-analysis file that contains the output triangular array. column 1: Tells what that row of data is. There are three choices: n : the column is a normal data element in the triangle. d : the column is an element on the diagonal of the triangle, where the two coordinates are equal. i : the column contains the information content correlating the two positions defined by columns 3 and 4. column 2: Tells what the dinucleotide is. There are 16 dinucleotides aa, ac, ag, at, ... , tt as well as `in' or `id', which denote columns that contain information content. `id' means that the information is for the diagonal, while `in' are off-diagonal. Combining "in" with "id" gives the entire information triangle. column 3: The position on the sequence that corresponds to the x coordinate on a Cartesian graph. column 4: The position on the sequence that corresponds to the y coordinate on a Cartesian graph. column 5: A column of constant 1's usable by xyplo (usually xscolumn). column 6: A column of constant 1's usable by xyplo (usually xscolumn). column 7: The number of data points at a position column 8: The frequency of the dinucleotide in column 2 at position (column 3, column 4). If the column is an information column, then this is the information at that position. When the position is off diagonal, this is the correlation information Rcorrelation with small sample correction (ssc16). When the position is on diagonal it is Rsequence. (2008 Jul 4: "without small sample correction" was here, but it appears to be in the code as ssc4.) column 9: One minus the frequency (or 1 - column 8). If this is an information column, then this is the chi-square value. column 10: A column of constants usable by xyplo. If this is in an information row, then this column is the number of degrees of freedom at that position column 11: A column of constant 1's usable by xyplo (usually hue). column 12: A column of constant 0's usable by xyplo (usually saturation). column 13: Information from column 8 normalized by dividing by the maximum possible information (4 for non-diagonal, 2 for diagonal). This gives a number between 0 and 1. This number is then multiplied by the parameter colorslope and then added to colorconstant. column 14: 1 minus column 13 column 15: correlation coefficient for x to y on information (in or id) rows column 16: A column of constant 1's usable by xyplo (saturation alternative). output: error messages from the program description Diana goes through a book and looks at relationships of dinucleotide pairs within the sequences. The output of the program is in the form of a triangular array rather than the entire square of x to y relationships. This saves half of the calculations. For every pair of coordinates, the frequency of the dinucleotide pair is tabulated. The program also calculates the chi-square for the dinucleotide given the expected mono-nucleotide frequencies. The correlation information is calculated as the information in the dinucleotide frequencies less the information in each of the two mononucleotides. The output of the program may be sent into xyplo for graphics. Note the distinction between the `in' and `id' information columns. Information in a binding site is usually going to appear on the diagonal, so by making this distinction, one can eliminate the diagonal information peaks for statstical analysis with genhis. The program now has an error correction for small sample size. See Schneider1986 p. 427, eqn A6. s = 4 or s = 16. examples See xyplop.diana and xyplop.diana.color for example xyplop files. To create a xyin file when the di control is 'b', extract all lines from the da file that contain ' in '. To do this in Unix use: grep ' in ' da > xyin That will not include the diagonal, which contains Rsequence and so normally should not be compared to the correlations. To include the diagonal use: grep i da > xyin To sort the output files to find the most significant correlations, one can use the Unix commands: grep in da | sort +8 -9 -n -r | cat > da.sorted The -r puts the most significant ones first. documentation @article{Stephens.Schneider.Splice, author = "R. M. Stephens and T. D. Schneider", title = "Features of spliceosome evolution and function inferred from an analysis of the information at human splice sites", journal = "J. Mol. Biol.", volume = "228", pages = "1124-1136", year = "1992"} see also {Example dianap file:} dianap {Example xyplop files used in conjunction with this program:} xyplop.diana, xyplop.diana.color {The program to plot the output:} xyplo.p {Program to list aligned sequences:} alist.p {Program to graph histograms:} genhis.p {The alpro program defines the protseq format:} alpro.p {Related programs for plotting the data in three dimensions:} da3d.p, da3drotate.p author R. Michael Stephens Modified by Tom Schneider to have user requested range and: faster calculation in main loop [1995 Jan 9], new input routine that reads protseq format [1995 Jan 9]. bugs It would be nice to allow a restricted list of pairs to be done. protseq has no definition of coordinates, and this forces the use of the fromprotseq and toprotseq. A definition for multiply aligned sequences with coordinates should be created. technical notes *) (* end module describe.diana *) (* ************************************************************************ *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module diana.const *) trimax = 25000; (* the maximum number of data points *) infofield = 6; (* real number field width *) infodecim = 5; (* number of decimal points *) (* end module diana.const *) (* begin module book.const *) (* constants needed for book manipulations *) dnamax = 1024; (* length of dna arrays *) namelength = 100; (* maximum key name length *) linelength = 80; (* maximum line readable in book *) (* end module book.const version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module string.const *) maxstring = 2000; (* the maximum string *) (* end module string.const version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module filler.const *) fillermax = 50; (* the size of the filler array for a string *) (* end module filler.const version = 4.57; (@ of prgmod.p 2002 Sep 5 *) type (* begin module string.type *) stringptr = ^string; (* pointer to a string *) string = record (* a string of characters *) letters: array[1..maxstring] of char; (* the letters in the string *) length: integer; (* the number of characters in the string *) current: integer; (* the letter we are working on *) next: stringptr; (* the next string in a series *) end; (* end module string.type version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module filler.type *) (* the following is an array used to fill a string. it is convenient to have it much shorter than the maxstring, so that it is easy to fill the string using procedure fillstring. the user must declare the value of constant fillermax. *) filler = packed array[1..fillermax] of char; (* end module filler.type version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module trigger.type *) trigger = record (* an object to be searched for *) seek: string; (* the characters looked for *) state: integer; (* how close to triggering we are *) skip: boolean; (* trigger not found- skip the line *) found: boolean (* the trigger was found *) end; (* end module trigger.type version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module book.type *) (* types needed for book manipulations *) chset = set of 'a'..'z'; (* types defined in book definition *) alpha = packed array[1..namelength] of char; (* this is not alfa *) (* name is a left justified string with blanks following the characters *) name = record letters: alpha; length: 0..namelength (* zero means an unspecified structure *) end; lineptr = ^line; line = record (* a line of characters *) letters: packed array [1..linelength] of char; length: 0..linelength; next: lineptr end; direction = (plus, minus, dircomplement, dirhomologous); configuration = (linear, circular); state = (on, off); header = record (* header of key *) keynam: name; (* key name of structure *) fulnam: lineptr; (* full name of structure *) note: lineptr (* note key *) end; (* begin module base.type *) (* define the four nucleotide bases *) base = (a,c,g,t); (* end module base.type version = 7.59; {of delmod.p 2002 Sep 5} *) (* sequence types *) dnaptr = ^dnastring; dnarange = 0..dnamax; seq = packed array[1..dnamax] of base; dnastring = record part: seq; length: dnarange; next: dnaptr end; orgkey = record (* organism key *) hea: header; mapunit: lineptr (* genetic map units *) end; chrkey = record (* chromosome key *) hea: header; mapbeg: real; (* number of genetic map beginning *) mapend: real (* number of genetic map ending *) end; pieceptr = ^piece; piekey = record (* piece key *) hea: header; mapbeg: real; (* genetic map beginning *) coocon: configuration; (* configruation (circular/linear) *) coodir: direction; (* direction (+/-) relative to genetic map *) coobeg: integer; (* beginning nucleotide *) cooend: integer; (* ending nucleotide *) piecon: configuration; (* configruation (circular/linear) *) piedir: direction; (* direction (+/-) relative to coordinates *) piebeg: integer; (* beginning nucleotide *) pieend: integer; (* ending nucleotide *) end; piece = record key: piekey; dna: dnaptr end; reference = record pienam : name; (* name of piece referred to *) mapbeg : real; (* genetic map beginning *) refdir : direction; (* direction relative to coordinates *) refbeg : integer; (* beginning nucleotide *) refend : integer; (* ending nucleotide *) end; genkey = record (* gene key *) hea : header; ref : reference; end; trakey = record (* transcript key *) hea : header; ref : reference; end; markerptr = ^marker; markey = record (* marker key *) hea : header; ref : reference; sta : state; phenotype : lineptr; next : markerptr; end; marker = record key : markey; dna : dnaptr; end; (* end module book.type version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module triangle.type *) (* define a triangular array as a linear one *) trianglearray = record data: array[0..trimax] of integer; lower: integer; (* lower bound on the triangle *) upper: integer; (* upper bound on the triangle *) side: integer; (* length of side of the triangle *) area: integer; (* area of the triangle *) end; (* end module triangle.type *) (* begin module diana.type *) (* stores the number of dinucleotides appearing at positions on a sequence *) trisquare = array[a..t,a..t] of trianglearray; (* define a data cube to store frequencies of a trisquare in *) realprism = array[a..t,a..t] of real; (* an array to store data for computing single base information values *) sumbase = array[a..t] of integer; (* an array to store the frequencies of the sumbase values *) basefreq = array[a..t] of real; (* end module diana.type *) var (* begin module book.var *) (* ************************************************************************ *) (* global variables needed for book manipulations *) (* free storage: *) freeline: lineptr; (* unused lines *) freedna: dnaptr; (* unused dnas *) readnumber: boolean; (* whether to read a number from the notes, or to read in the notes *) number: integer; (* the number of the item just read *) numbered: boolean; (* true when the item just read is numbered *) skipunnum: boolean; (* a control variable to allow skipping of un-numbered items in the book *) (* ************************************************************************ *) (* end module book.var version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module diana.var *) book, (* the delila book being analyzed *) inst, (* the instructions used to create the delila book *) protseq, (* sequences in protseq format *) dianap, (* parameters to control the program *) da: text; (* the output triangular analysis *) (* end module diana.var *) (* ************************************************************************ *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module copyaline *) procedure copyaline(var fin, fout: text); (* copy a line from file fin to file fout *) begin (* copyaline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); writeln(fout); end; (* copyaline *) (* end module copyaline version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module copylines *) function copylines(var fin, fout: text; n: integer): integer; (* copy n lines of file fin to file fout. the actual number of lines copied is returned. *) var index: integer; (* the current line number *) begin (* copylines *) index := 0; while (not eof(fin)) and (index < n) do begin copyaline(fin, fout); index := succ(index) end; copylines := index end; (* copylines *) (* end module copylines version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module package.trigger *) (* ************************************************************************ *) (* begin module clearstring *) procedure clearstring(var ribbon: string); (* empty the string *) var index: integer; (* to the ribbon *) begin (* clearstring *) with ribbon do begin for index := 1 to maxstring do letters[index] := ' '; length := 0; current := 0; end end; (* clearstring *) procedure initializestring(var ribbon: string); (* start the string with a nil pointer. This routine should be called before doing linked list work. This allows the standard string routines to clear the string without killing the pointer. *) begin (* initializestring *) clearstring(ribbon); ribbon.next := nil; end; (* initializestring *) (* end module clearstring version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module writestring *) procedure writestring(var tofile: text; var s: string); (* write the string s to file tofile, no writeln *) var i: integer; (* index to s *) begin (* writestring *) with s do for i := 1 to length do write(tofile, letters[i]) end; (* writestring *) (* end module writestring version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module filler.fillstring *) procedure fillstring(var s: string; a: filler); (* this procedure makes it reasonably easy to fill the string s with characters. one calls the procedure as: *) (* 1 2 3 4 5 *) (* 12345678901234567890123456789012345678901234567890 *) (* fillstring(s, 'this-is-the-string '); the two comments make it easy to line the characters up. also, for this example, it was assumed that the length of filler as defined by the constant fillermax was 50. *) var length: integer; (* of the string without trailing blanks *) index: integer; (* of s *) begin (* fillstring *) clearstring(s); length := fillermax; while (length > 1) and (a[length] = ' ') do length := pred(length); if (length = 1) and (a[length] = ' ') then begin writeln(output, 'fillstring: the string is empty'); halt end; for index := 1 to length do s.letters[index] := a[index]; s.length := length; s.current := 1 end; (* fillstring *) (* end module filler.fillstring version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module filler.filltrigger *) procedure filltrigger(var t: trigger; a: filler); (* fill the trigger t *) begin (* filltrigger *) fillstring(t.seek,a) end; (* fillstring *) (* end module filler.filltrigger version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module trigger.proc *) (* this module allows one to scan a series of characters, as from an array or a file, and to "trigger" or detect a simple string in the series. the advantage of the trigger is that several triggers can "observe" a stream of characters at once, each looking for a different thing. some other modules required: interact.const, interact.type *) procedure resettrigger(var t: trigger); (* reset the trigger to ground state *) begin (* resettrigger *) with t do begin state := 0; skip := false; found := false end end; (* resettrigger *) procedure testfortrigger(ch: char; var t: trigger); (* look at the character ch. if it is part of the trigger (at the current trigger state), then the trigger state goes higher. if it is not part of the trigger then the trigger state is reset, skip is true and one should skip onward to find the trigger. if the trigger is found, found is true. 1996 Sep 12: Bug found! In the case of a trigger "ab", the program used to miss it for situations like "aab". This was because at the first a it would step up. Then it would see the second a and recognize that was not part of ab. It would fail to realize that it could be the start of a new one. The code now accounts for that possibility. *) begin (* testfortrigger *) with t do begin state := succ(state); { writestring(list,seek); writeln(list,'testfortrigger seek.letters[',state:1,']:', seek.letters[state],' ch:',ch); } if seek.letters[state] = ch then begin skip := false; if state = seek.length then found := true else found := false end else begin (* it failed. But wait! It could be the beginning of a NEW trigger string! *) if seek.letters[1] = ch then begin state := 1; skip := false; found := false end else begin (* reset trigger *) state := 0; skip := true; found := false end end end end; (* testfortrigger *) (* end module trigger.proc version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module skipblanks *) procedure skipblanks(var thefile: text); (* skip over blanks until a non-blank, or end of line, is found *) begin while (thefile^ = ' ') and not eoln(thefile) do get(thefile); end; procedure skipnonblanks(var thefile: text); (* skip over nonblanks until a blank, or end of line, is found *) begin while (thefile^ <> ' ') and not eoln(thefile) do get(thefile); end; procedure skipcolumn(var thefile: text); (* skip over a data column *) begin skipblanks(thefile); skipnonblanks(thefile) end; (* end module skipblanks version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* ************************************************************************ *) (* end module package.trigger version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module package.align *) (* ************************************************************************ *) (* begin module package.getpiece *) (* ************************************************************************ *) (* begin module package.brpiece *) (* ************************************************************************ *) (* begin module book.basis *) (* procedures needed for book manipulations *) (* get procedures should be used for all linked lists of records *) procedure getline(var l: lineptr); (* obtain a line from the free line list or by making a new one *) begin if freeline<>nil then begin l:=freeline; freeline:=freeline^.next end else new(l); l^.length:=0; l^.next:=nil end; procedure getdna(var l: dnaptr); begin if freedna<>nil then begin l:=freedna; freedna:=freedna^.next end else new(l); l^.length:=0; l^.next:=nil end; (* clear procedures should be called each time the records are no longer needed failure to do this may result in a stack overflow. *) procedure clearline(var l: lineptr); (* return a line to the free line list *) var lptr: lineptr; begin if l<>nil then begin lptr:=l; l:=l^.next; lptr^.next:=freeline; freeline:=lptr end end; procedure writeline(var afile: text; l: lineptr; carriagereturn: boolean); (* write a line to a file, with carriage return if carriagereturn is true. *) var index: integer; (* index to characters in l *) begin with l^ do begin for index := 1 to length do write(afile, letters[index]); end; if carriagereturn then writeln(afile); end; procedure showfreedna; (* show the freedna list *) var counter: integer; (* count of freedna list *) l: dnaptr; (* pointer into freedna list *) begin l := freedna; counter := 0; while l <> nil do begin counter := succ(counter); write(output,counter:1); write(output, ', length = ',l^.length:1); { This is illegal according to gpc because one cannot write a pointer to a text file. It can be unearthed for debugging. write(output, ', pointer id: ',l:1); } writeln(output); l := l^.next end; end; procedure cleardna(var l: dnaptr); var lptr: dnaptr; begin if l<>nil then begin lptr:=l; l:=l^.next; lptr^.next:=freedna; freedna:=lptr end end; procedure clearheader(var h: header); (* clear the header h (remove lines to free storage) *) begin with h do begin clearline(fulnam); while note<>nil do clearline(note) end end; procedure clearpiece(var p: pieceptr); (* clear the dna of the piece *) begin while p^.dna<>nil do cleardna(p^.dna); clearheader(p^.key.hea) end; function chartobase(ch:char):base; (* convert a character into a base *) begin case ch of 'a': chartobase:=a; 'c': chartobase:=c; 'g': chartobase:=g; 't': chartobase:=t end end; function basetochar(ba:base):char; (* convert a base into a character *) begin case ba of a: basetochar:='a'; c: basetochar:='c'; g: basetochar:='g'; t: basetochar:='t'; end end; function complement(ba:base):base; (* take the complement of ba *) begin case ba of a: complement:=t; c: complement:=g; g: complement:=c; t: complement:=a; end end; function chomplement(b: char): char; (* create the character complement of base b. I must be getting hungry! *) begin chomplement := basetochar(complement(chartobase(b))); end; function pietoint(p: integer; pie: pieceptr): integer; (* p is a coordinate on the piece. we want to transform p into a number from 1 to n: an internal coordinate system for easy manipulation of piece coordinates *) (* Note: the dirhomologous and dircomplement are treated as plus and minus directions, which MIGHT NOT BE RIGHT! *) var i: integer; (* an intermediate value *) begin with pie^.key do begin case piedir of dirhomologous, plus: if p>=piebeg then i:=p-piebeg+1 else i:=(p-coobeg)+(cooend-piebeg)+2; dircomplement, minus: if p<=piebeg then i:=piebeg-p+1 else i:=(cooend-p)+(piebeg-coobeg)+2 end; pietoint:=i end end; function inttopie(i: integer; pie: pieceptr):integer; (* i is in the range 1 to some maximum. it is an internal coordinate system for the program. we want to do a coordinate transformation to obtain a value in the range of the piece called pie: i=1 corresponds to piebeg and i=its maximum corresponds to pieend *) (* Note: the dirhomologous and dircomplement are treated as plus and minus directions, which MIGHT NOT BE RIGHT! *) var p: integer; (* an intermediate value *) begin with pie^.key do begin case piedir of dirhomologous, plus: begin p:=piebeg+(i-1); if p>cooend then if coocon=circular then p:=p-(cooend-coobeg+1) end; dircomplement, minus: begin p:=piebeg-(i-1); if p '*' then begin writeln(output,' procedure skipstar: bad book'); writeln(output,' "*" expected as first character on the line, but "', thefile^,'" was found'); halt end; get(thefile); (* skip the star *) if thefile^ <> ' ' then begin writeln(output,' procedure skipstar: bad book'); writeln(output,' "* " expected on a line but "*', thefile^,'" was found'); halt end; get(thefile) (* skip the blank *) end end; (* skipstar *) (* end module book.skipstar version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brreanum *) procedure brreanum(var thefile: text; var theline: integer; var reanum: real); (* read a real number from the file *) begin skipstar(thefile); readln(thefile,reanum); theline := succ(theline) end; (* end module book.brreanum version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brnumber *) procedure brnumber(var thefile: text; var theline: integer; var num: integer); (* read a number from the file *) begin skipstar(thefile); readln(thefile,num); theline := succ(theline) end; (* end module book.brnumber version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brname *) procedure brname(var thefile: text; var theline: integer; var nam: name); (* read a name from the file *) var i: integer; (* an index to the name *) c: char; (* a character read *) begin (* brname *) skipstar(thefile); with nam do begin length:=0; repeat length:=succ(length); read(thefile,c); letters[length] := c until (eoln(thefile)) or (length>=namelength) or (letters[length]=' '); if letters[length]=' ' then length:=length-1; if length 'n' then begin skipstar(thefile); if not eoln(thefile) then begin if thefile^ = '#' then begin numbered := true; get(thefile); (* move past the number symbol *) read(thefile,number); end end; repeat readln(thefile); theline := succ(theline) until thefile^ = 'n'; readln(thefile); theline := succ(theline) end else begin readln(thefile); theline := succ(theline) end end end; (* brnotenumber *) (* end module book.brnotenumber version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brnote *) procedure brnote(var thefile: text; var theline: integer; var note: lineptr); (* read note key *) var newnote: lineptr; (* the new note *) previousnote: lineptr; (* the last line of the notes *) begin (* brnote *) note:=nil; if thefile^ = 'n' then begin (* enter note *) readln(thefile); theline := succ(theline); if thefile^ <> 'n' then begin (* abort null note (n/n) *) getline(note); newnote:=note; while thefile^ <> 'n' do begin (* wait until end of note *) brline(thefile,theline,newnote); previousnote:=newnote; (* get next note *) getline(newnote^.next); newnote:=newnote^.next; end; (* last note was not used, so: *) clearline(newnote); previousnote^.next:=nil; readln(thefile); theline := succ(theline); end else begin readln(thefile); theline := succ(theline); end; end end; (* brnote *) (* end module book.brnote version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brheader *) procedure brheader(var thefile: text; var theline: integer; var hea: header); (* read the header of a key. *) begin with hea do begin readln(thefile); (* move past the object name - new definition 1999 Mar 13 *) theline := succ(theline); {bbb} (* read key name *) brname(thefile,theline,keynam); (* read full name *) getline(fulnam); brline(thefile,theline,fulnam); (* read note key *) if readnumber then brnotenumber(thefile,theline,note) else brnote(thefile,theline,note) end end; (* end module book.brheader version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.copyheader *) procedure copyheader(fromhea: header; var tohea: header); (* copy the header fromhea into tohea. Note that the linked objects are NOT copied, but merely pointed to. *) begin tohea.keynam.letters := fromhea.keynam.letters; tohea.keynam.length := fromhea.keynam.length; tohea.note := fromhea.note; tohea.fulnam := fromhea.fulnam; end; (* end module book.copyheader version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brpiekey *) procedure brpiekey(var thefile: text; var theline: integer; var pie: piekey); (* read piece key, track the line number *) begin with pie do begin brheader(thefile,theline,hea); brreanum(thefile,theline,mapbeg); brconfig(thefile,theline,coocon); brdirect(thefile,theline,coodir); brnumber(thefile,theline,coobeg); brnumber(thefile,theline,cooend); brconfig(thefile,theline,piecon); brdirect(thefile,theline,piedir); brnumber(thefile,theline,piebeg); brnumber(thefile,theline,pieend); end end; (* end module book.brpiekey version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brdna *) procedure brdna(var thefile: text; var theline: integer; var dna: dnaptr); (* read in dna from thefile, track the line *) (* note: if the dna were circularized, by linking the last dnastring to the first, then the cleardna routine could not clear properly, and would loop forever... there is no reason to do that, since a simple mod function will allow one to access the circle. *) var ch: char; workdna: dnaptr; begin getdna(dna); workdna:=dna; ch:=getto(thefile,theline,['d']); readln(thefile); theline := succ(theline); read(thefile,ch); (* skipstar *) while (ch = '*') do begin read(thefile,ch); (* skip blank *) repeat read(thefile,ch); if ch in ['a','c','g','t'] then begin if workdna^.length=dnamax then begin getdna(workdna^.next); workdna:=workdna^.next end; workdna^.length:=succ(workdna^.length); workdna^.part[workdna^.length]:=chartobase(ch) end until eoln(thefile); readln(thefile); (* go to next line *) theline := succ(theline); read(thefile,ch); (* ch is either '*' or 'd' *) end; readln(thefile); (* read past the d *) theline := succ(theline); end; (* end module book.brdna version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brpiece *) procedure brpiece(var thefile: text; var theline: integer; var pie: pieceptr); (* read in a piece, change theline to reflect the lines traversed *) begin { readln(thefile); (* move past the word 'piece' - new definition 1999 Mar 13 *) theline := succ(theline); (* BUG: was below! *) bbb} brpiekey(thefile,theline,pie^.key); if numbered or (not skipunnum) then brdna(thefile,theline,pie^.dna); readln(thefile); (* move past the word 'piece' - new definition 1999 Mar 13 *) theline := succ(theline); end; (* end module book.brpiece version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.brinit *) procedure brinit(var book: text; var theline: integer); (* check that the book is ok to read, and set up the global variables for br routines *) begin (* brinit *) (* halt if the book is bad (first word is 'halt') or the first character is not * *) reset(book); if not eof(book) then begin (* check for the date line *) if book^ <> '*' then begin if book^ <> 'h' then writeln(output, ' this is not the first line of a book:') else writeln(output, ' bad book:'); write(output, ' '); while not (eoln(book) or eof(book)) do begin write(output, book^); get(book) end; writeln(output); halt end end else begin writeln(output, ' book is empty'); halt end; (* initialize free storage *) freeline:=nil; freedna:=nil; readnumber:=true; (* usually we read in numbers for items *) number:=0; (* arbitrary value *) numbered:=false; (* the piece has no number (none yet read in) *) skipunnum:=false; theline := 1; end; (* brinit *) (* end module book.brinit version = 7.59; {of delmod.p 2002 Sep 5} *) (* ************************************************************************ *) (* end module package.brpiece version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module book.getpiece *) procedure getpiece(var thefile: text; var theline: integer; var pie: pieceptr); (* move to and read in the next piece in the book *) var ch: char; begin ch:=getto(thefile,theline,['p']); (* get to the next p(iece) in the book *) if ch<>' ' then begin brpiece(thefile,theline,pie); { 1999 june 2: removed this: ch:=getto(thefile,theline,['p']); (* read to end of p *) } { bbb - now done in brpiece readln(thefile); (* read past piece *) theline := succ(theline); } end else clearpiece(pie); end; (* end module book.getpiece version = 7.59; {of delmod.p 2002 Sep 5} *) (* ************************************************************************ *) (* end module package.getpiece version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module findblank *) procedure findblank(var afile: text); (* read a file to find the next blank character *) var ch: char; begin repeat read(afile,ch) until ch = ' ' end; (* end module findblank version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module findnonblank *) procedure findnonblank(var afile: text; var ch: char); (* find the next non blank character in a file, return it in ch. *) begin ch:=' '; while (not eof(afile)) and (ch = ' ') do begin read(afile,ch); if eoln(afile) then readln(afile) end end; (* end module findnonblank version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module align.align *) procedure align(var inst, book: text; var theline: integer; var pie: pieceptr; var length, alignedbase: integer); (* documentation on align is in module info.align and delman.use.aligned.books. 1996 Sep 12: The routine now uses the trigger functions found in prgmod. The bug in the oldalign routine (that it misses the end of comments that end in a series of astrisks) has been fixed. It now checks that the piece corresponds to the book. *) const maximumrange = 10000; (* if the alignment point is more than this distance from the piece ends, the program halts in an attempt to catch the alignment bug... 1991 Jan 11 It appears that the rewrite of the code has removed the bug, but the check will be kept. *) semicolon = ';'; (* end of delila instruction *) var ch: char; (* a character in inst *) p: integer; (* index to a piece name *) p1: integer; (* another index to a piece name *) done: boolean; (* done finding an aligning get *) thebase: integer; (* the base read in *) indefault: boolean; (* true when within a default statement. These can contain the word 'piece', which must be ignored. *) gettrigger: trigger; (* trigger to find 'get' *) defaulttrigger: trigger; (* trigger to find 'default' *) nametrigger: trigger; (* trigger to find 'name' *) piecetrigger: trigger; (* trigger to find 'piece' *) settrigger: trigger; (* trigger to find 'set' *) begincomment: trigger; (* trigger to find '(-*' (ignore the dash!) *) endcomment: trigger; (* trigger to find '*-)' (ignore the dash!) *) begincurly: trigger; (* trigger to find comments: '{' *) endcurly: trigger; (* trigger to find comments: '}' *) quote1trigger: trigger; (* trigger to find single quote ' *) quote2trigger: trigger; (* trigger to find double quote " *) { procedure rd(var f: text; var ch: char); (* read ch from f allowing inspection of the result *) begin read(f,ch); write(output,ch); write(list,ch); write(output,'<',ch,'>'); end; procedure rdln(var f: text); (* readln f allowing inspection of the result *) begin readln(f); writeln(output); writeln(list); end; } procedure skipcomment(var f: text); (* skip an entire comment *) var comment: boolean; (* true means we are inside a comment *) begin (* skip to end of comment *) resettrigger(endcomment); comment := true; while comment do begin if eof(f) then begin writeln(output,'A comment does not end!'); halt end; if eoln(f) then readln(f) { rdln(f) } else begin {write(output,'<'); rd(f,ch); write(output,'>');} read(f,ch); testfortrigger(ch, endcomment); if endcomment.found then comment := false; end end end; procedure skipcurly(var f: text); (* skip an entire comment made by {}*) var comment: boolean; (* true means we are inside a comment *) begin (* skip to end of comment *) resettrigger(endcurly); comment := true; while comment do begin if eof(f) then begin writeln(output,'A comment does not end!'); halt end; if eoln(f) then readln(f) { rdln(f) } else begin {write(output,'<'); rd(f,ch); write(output,'>');} read(f,ch); testfortrigger(ch, endcurly); if endcurly.found then comment := false; end end end; procedure skipquote(quote: trigger); (* skip an entire quote of either the ' or " persuasion *) var kind: char; (* the kind of quote, ' or " *) begin kind := quote.seek.letters[1]; {writeln(output,'skipquote ',kind);} repeat findnonblank(inst,ch); (* get to the quote *) until (ch = kind) or eof(inst); if ch <> kind then begin writeln(output,'end of quote starting with ',kind,' not found'); halt; end; end; begin filltrigger(defaulttrigger,'default'); filltrigger(gettrigger,'get '); filltrigger(nametrigger,'name '); filltrigger(piecetrigger,'piece '); filltrigger(settrigger,'set '); filltrigger(begincomment,'(* '); filltrigger(endcomment,'*) '); filltrigger(begincurly,'{ '); filltrigger(endcurly,'} '); filltrigger(quote1trigger,''' '); filltrigger(quote2trigger,'" '); resettrigger(defaulttrigger); resettrigger(gettrigger); resettrigger(nametrigger); resettrigger(piecetrigger); resettrigger(settrigger); resettrigger(begincomment); resettrigger(begincurly); resettrigger(quote1trigger); resettrigger(quote2trigger); indefault := false; if not eof(book) then begin (* if there is still more to the book ... *) getpiece(book,theline,pie); (* read in the piece *) if not eof(book) then begin (* if we found a piece ... *) length:=pietoint(pie^.key.pieend,pie); (* calculate piece length *) (* now find in inst the next occurance of 'get' *) done := false; while not done do begin if eof(inst) then begin (* no instructions? *) alignedbase := 1; (* simply align by the first base *) done := true end else begin if eoln(inst) then readln(inst) {then rdln(inst)} else begin {rd(inst,ch);} read(inst,ch); testfortrigger(ch, begincomment); testfortrigger(ch, begincurly); if begincomment.found or begincurly.found then begin if ch = '*' then begin skipcomment(inst); resettrigger(begincomment); end else begin resettrigger(begincurly); skipcurly(inst); end end else begin (* we are not inside a comment *) testfortrigger(ch, gettrigger); if gettrigger.found then begin findnonblank(inst,ch); (* get to "from" *) findblank(inst); (* get past "from" *) read(inst,thebase); (* read in the alignedbase *) {writeln(output);writeln(output,'thebase = ',thebase:1);} alignedbase:=pietoint(thebase,pie); {writeln(output,'alignedbase=',alignedbase:1);} done := true end; testfortrigger(ch, quote1trigger); if quote1trigger.found then begin skipquote(quote1trigger); end; testfortrigger(ch, quote2trigger); if quote2trigger.found then begin skipquote(quote2trigger); end; testfortrigger(ch, defaulttrigger); if defaulttrigger.found then begin indefault := true; resettrigger(defaulttrigger) end; if ch = semicolon then indefault := false; testfortrigger(ch, settrigger); if settrigger.found then begin indefault := true; resettrigger(settrigger) end; if ch = semicolon then indefault := false; (* check that piece names are correct *) testfortrigger(ch, piecetrigger); if not indefault then if piecetrigger.found then begin skipblanks(inst); (* get to name *) with pie^.key.hea.keynam do begin for p := 1 to length do begin read(inst,ch); if letters[p] <> ch then begin writeln(output,'The piece name in the book: '); writeln(output,letters:length); writeln(output,'does not match', ' the inst file name:'); (* write the letters that matched: *) for p1 := 1 to p-1 do write(output,letters[p]); (* write the offending letter: *) write(output, ch); (* get the rest of the name and show it: *) done := eoln(inst); while not done do begin done := eoln(inst); if not done then begin read(inst,ch); if (ch = ' ') or (ch = ';') then done := true; if not done then write(output,ch); end; end; writeln(output); (* mark the first letter that does not match: *) for p1 := 1 to p-1 do write(output,' '); write(output,'^'); writeln(output); halt end; end end; end; end end end end; if (alignedbase <= -maximumrange) or (alignedbase > length + maximumrange) then begin writeln(output,' In procedure align:'); writeln(output,' read in base was ',thebase:1); writeln(output,' in internal coordinates: ',alignedbase:1); writeln(output,' maximum range was ',maximumrange:1); writeln(output,' piece length was ',length:1); with pie^.key.hea.keynam do writeln(output,' piece name: ',letters:length); writeln(output,' piece number: ',number:1); writeln(output,' aligned base is too far away... see the code'); halt end end end end; (* end module align.align version = 7.59; {of delmod.p 2002 Sep 5} *) (* ************************************************************************ *) (* end module package.align version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module writepiename *) procedure writepiename(var thefile: text; pie: pieceptr; namespace: integer); (* write the piece name to the file, left justified in a field of namespace *) var index: integer; (* index to the field *) begin with pie^ do for index := 1 to namespace do if index <= key.hea.keynam.length then write(thefile,key.hea.keynam.letters[index]) else write(thefile,' '); end; (* end module writepiename version = 5.01; (@ of cluster.p 1990 February 13 *) (* begin module package.numbar *) (* ************************************************************************ *) (* begin module numberdigit *) function numberdigit(number, logplace:integer): char; (* return the digit at the place value ('logplace') position of number. example: numberdigit(13625, 3) = 3 numberdigit(13625, 4) = 1 2000 July 30 'myabsolute' replaced 'absolute', which is apparently a keyword for GPC. The name is kept to keep the code looking similar to its origin. *) var place: integer; (* the exponent of logplace *) count: integer; (* used to make place *) myabsolute: integer; (* the absolute value of number *) acharacter: char; (* the character to be returned *) procedure digit; (* extract a digit at the place position *) var tenplace: integer; (* ten times place *) z: integer; (* an intermediate value *) d: integer; (* the digit extracted *) begin (* digit *) tenplace:=10*place; z:=myabsolute-((myabsolute div tenplace)*tenplace); if place = 1 then d:=z else d:= z div place; case d of 0: acharacter:='0'; 1: acharacter:='1'; 2: acharacter:='2'; 3: acharacter:='3'; 4: acharacter:='4'; 5: acharacter:='5'; 6: acharacter:='6'; 7: acharacter:='7'; 8: acharacter:='8'; 9: acharacter:='9'; end end; (* digit *) procedure sign; (* put a negative sign out or a positive sign *) begin (* sign *) if number <0 then acharacter:='-' else acharacter:='+' end; (* sign *) begin (* numberdigit *) place:=1; for count:=1 to logplace do place:=10*place; if number=0 then begin if place=1 then acharacter:='0' else acharacter:=' ' end else begin myabsolute:=abs(number); if myabsolute < (place div 10) then acharacter:=' ' else if myabsolute >= place then digit else sign end; numberdigit:=acharacter end; (* numberdigit *) (* end module numberdigit version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module numbersize *) function numbersize(n: integer):integer; (* calculate amount of space to be reserved for the integer n *) const ln10 = 2.30259; (* natural log of 10 - for conversion to log base 10 *) epsilon = 0.00001; (* a small number to correct log base 10 errors *) var size: integer; (* intermediate result *) begin (* numbersize *) if n = 0 then numbersize:=1 else begin size:=trunc(ln(abs(n))/ln10 + epsilon) + 1; (* the 1 is for the last digit *) (* the epsilon assures that we do not lose a place due to roundoff. eg, sometimes log base 10 of 10 would be 0.9999 instead of 1, and we would not do it right... note: this will fail for very large numbers on the order of 1/epsilon. *) if n < 0 then size := succ(size); (* account for minus sign *) numbersize := size; end end; (* numbersize *) (* end module numbersize version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module numberbar *) function firstlastmax(firstnumber, lastnumber: integer): integer; (* compute the sizes of firstnumber and lastnumber (including + or - sign) and then determine which number is larger *) var firstlines: integer; (* number of lines needed for firstumber *) lastlines: integer; (* number of lines needed for lastnumber *) begin firstlines := numbersize(firstnumber); if firstnumber > 0 then firstlines := firstlines + 1; (* add one more for + sign *) lastlines := numbersize(lastnumber); if lastnumber > 0 then lastlines := lastlines + 1; (* add one more for + sign *) if firstlines > lastlines then firstlastmax := firstlines else firstlastmax := lastlines; end; procedure numberbar(var afile: text; spaces, firstnumber, lastnumber: integer; var linesused: integer); (* write a bar of numbers to a file, with several spaces before. the number of lines used is returned *) var logplace: integer; (* the log of the digit being looked at *) number: integer; (* the current number being written *) spacecount: integer; (* count of spaces *) begin { 2000 June 24: This code was not sufficient to deal with the sign correctly. The numbersize routine now does *not* give the + sign (which makes it useful for other purposes) so we have to account for it here now. if abs(firstnumber) > abs(lastnumber) then linesused:= numbersize(firstnumber) else linesused:= numbersize(lastnumber); } linesused := firstlastmax(firstnumber, lastnumber); { writeln(output,'numberbar says linesused = ',linesused:1); for logplace:=linesused-1 downto 0 do begin 1999 July 15: this is changed to linesused since numbersize now accounts for the sign: for logplace := linesused downto 0 do begin 2000 June 24: back to the old code ... } for logplace := linesused-1 downto 0 do begin for spacecount:=1 to spaces do write(afile,' '); for number:=firstnumber to lastnumber do write(afile,numberdigit(number,logplace)); writeln(afile) end end; (* end module numberbar version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* ************************************************************************ *) (* end module package.numbar version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module book.getbase *) function getbase(position: integer; pie: pieceptr):base; (* Get a base from the position (internal coordinates) of the piece. Protection is made against positions outside the piece. In the case of circles it would be convenient to wrap around when requests are off the end. So the routine will do a modular wrap for positions outside the range 1 to the length. This is a new feature as of 2000 March 22. *) var workdna: dnaptr; (* pointer to the dna part of pie *) p: integer; (* current count of bases into the workdna *) spot: integer; (* the last base of the dna part *) thelength: integer; (* the length of the piece *) begin { writeln(output,'NEW getbase: position=',position:1,'^^^^^^^^^^^^^^^^^^^^'); } (* handle cases of position out of range by circular wrapping *) thelength := piecelength(pie); while position < 1 do position := position + thelength; while position > thelength do position := position - thelength; workdna:=pie^.dna; p:=workdna^.length; while position > p do begin { writeln(output,' workdna^.length=',workdna^.length:1); } workdna := workdna^.next; if workdna = nil then begin writeln(output,'error in function getbase!'); halt end; p := p + workdna^.length; end; { writeln(output,'p=',p:1); } if workdna = nil then begin writeln(output,'error in getbase: request off end of piece'); halt end else begin spot := workdna^.length - (p-position); { writeln(output,'spot=',spot:1); showdnasegment(output,workdna, spot); } if (spot <= 0) then begin writeln(output,'error in getbase, spot (= ',spot:1, ') must be positive'); halt end; if (spot > workdna^.length) then begin writeln(output,'error in getbase, spot (=',spot:1, ') must be less than length (=',workdna^.length:1,')'); halt end; { writeln(output,'base = ', workdna^.part[spot]); } getbase:=workdna^.part[spot] end end; (* end module book.getbase version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module align.withinalignment *) function withinalignment(alignedposition, alignedbase, length: integer) :boolean; (* this function tells one if an aligned position, relative to an aligned base in a piece of some length is within the piece. *) var p: integer;(* the position on the piece *) begin p := alignedposition + alignedbase; withinalignment := (p>0) and (p<=length) end; (* end module align.withinalignment version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module align.maxminalignment *) procedure maxminalignment(var inst, book: text; var theline: integer; var fromparam, toparam: integer; alignmenttype: char); (* prescan the book to find the range over which the pieces of the book are spread, relative to the aligned base. the procedure uses the same variables that align does (so it can call align itself), and it returns the range in fromparam and toparam. alignmenttype: 'f' means alignment by First internal coordinate base, 'b' means alignment by Book, 'i' means alignment by Instructions. *) const maximumrange = 500; (* the maximum size aligned piece; this will presumably catch the alignment bug *) var distance: integer; (* a distance to the aligned base *) pie: pieceptr; length, alignedbase: integer; begin new(pie); (* set an initial range for the two bounds *) fromparam:=+maxint; toparam:=-maxint; reset(book); reset(inst); while not eof(book) do begin case alignmenttype of 'i': align(inst,book,theline,pie,length,alignedbase); 'b','f': begin getpiece(book,theline,pie); (* read in the piece *) length := piecelength(pie); end; end; if not eof(book) then begin case alignmenttype of 'f': begin (* force alignment on first base *) alignedbase := 0; fromparam := 1; distance:=length-alignedbase; if toparam < distance then toparam:=distance; end; 'i': begin (* use the alignedbase from the book *) distance:=1-alignedbase; if fromparam > distance then fromparam:=distance; distance:=length-alignedbase; if toparam < distance then toparam:=distance; end; 'b': begin (* use the internal book *) alignedbase := pietoint(0, pie); distance:=1-alignedbase; if fromparam > distance then fromparam:=distance; distance:=length-alignedbase; if toparam < distance then toparam:=distance; end; end; clearpiece(pie) end end; if toparam - fromparam > maximumrange then begin writeln(output,' in procedure maxminalignment:'); writeln(output,' alignedbase = ',alignedbase:1); writeln(output,' fromparameter = ',fromparam:1); writeln(output,' toparameter = ',toparam:1); writeln(output,' this exceeds the maximum range allowed (', maximumrange:1,')'); writeln(output,' see notes in the procedure. '); halt (* notes: if you desired this range, increase 'maximumrange'. otherwise, this may indicate a bug - either: 1) locate the bug (and tell tom schneider, please...) 2) reduce the size of the fragments, from one or the other end until the bombing is stopped. *) end; (* make the book readable again *) reset(book); reset(inst); dispose(pie) end; (* end module align.maxminalignment version = 7.59; {of delmod.p 2002 Sep 5} *) (* begin module package.triangle.arrays *) (* ************************************************************************ *) function tripos(tri: trianglearray; x,y: integer): integer; (* return the linear position in the triangle array given by (x,y) *) var hold: integer; (* for reversing the order of x and y *) begin (* switch the two values *) if x < y then begin hold := x; x := y; y := hold end; with tri do begin if (x < lower) or (y > upper) then begin write(output,'array lower: ',lower:5,' upper: ',upper:5,' exceeded'); writeln(output,' out of bounds at (',x:1,',',y:1,')'); halt end; (* reset the coordinates so that they range from (0,0) up *) y := y - lower; x := x - lower; (* the distance out in a linear array is the rectangular distance side*y, minus the area of a triangle of side y-1, plus the x distance: *) tripos := side*y - (y * (y + 1) div 2) + x; end; end; procedure cleartriangle(var triangle: trianglearray; fromspot, tospot: integer); (* clear the triangular array triangle and assign the lower and upper bounds to fromspot and tospot *) var loop: integer; (* a loop control variable *) begin with triangle do begin lower := fromspot; upper := tospot; side := upper - lower + 1; if side <= 0 then begin write(output,'illegal triangle size; lower: ',lower:5); writeln(output,' upper: ',upper:5); halt; end; area := tripos(triangle,upper,upper); if area > trimax then begin writeln(output,'requested triangle size too large for trimax:'); writeln(output,'trimax is: ',trimax:1, ' request was ',area:1); halt; end; for loop := 0 to area do triangle.data[loop] := 0; end; end; procedure puttriangle(var tri: trianglearray; value, x, y: integer); (* place the value at coordinates (x,y) in tri *) begin tri.data[tripos(tri,x,y)] := value; end; procedure gettriangle(var tri: trianglearray; var value: integer; x,y: integer); (* get the value at coordinates (x,y) in tri *) begin value := tri.data[tripos(tri,x,y)]; end; procedure writetriangle(var fout: text; tri: trianglearray); (* write out the triangle array tri to fout *) var x: integer; (* loop control for the horizontal *) y: integer; (* loop control for the vertical *) value: integer; (* the value of the array element (x,y) in tri *) begin with tri do begin for y := upper downto lower do begin for x := lower to upper do begin if x >= y then begin gettriangle(tri,value,x,y); write(fout,' ',value:5); end else write(fout,' ','-----'); end; writeln(fout); end; end; end; (* ************************************************************************ *) (* end module package.triangle.arrays *) (* begin module diana.protseqrange *) procedure protseqrange(var protseq: text; var fromparam, toparam: integer); (* determine the range of the protseq data. *) begin fromparam := 1; (* unfortunately *) while (protseq^ = '*') OR (protseq^ = '>') do readln(protseq); (* skip header *) toparam := 0; repeat get(protseq); toparam := succ(toparam) until (protseq^ = '.') or eoln(protseq); {writeln(output,'toparam: ',toparam:1);} end; (* end module diana.protseqrange *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module diana.filehead *) procedure filehead (var fin, fout: text; fromrange, torange: integer); (* place the heading lines on the fout file *) var copynumber: integer; (* the number of lines being copied from the text *) pairs: integer; (* number of pairs to be analzyed *) side: integer; (* length of triangle side *) procedure showdata(var tofile: text; c: char); (* show the data to the file, start lines with character c *) begin writeln(tofile,c,' ', fromrange:1,' ', torange:1,' is the from-to range used'); writeln(tofile,c,' ', side:1,' bases is the side of the data triangle'); writeln(tofile,c,' ', pairs:1,' pairs will be analyzed [=side*(side+1)/2]'); end; begin rewrite(fout); (* write the name and version of the program *) writeln(fout,'* diana ',version:4:2); (* copy the identification lines to the fout *) reset(fin); copynumber := 1; write(fout,'* '); if copylines(fin,fout,copynumber) <> copynumber then begin writeln(fout,'inst does not have enough header lines'); halt end; side := torange - fromrange + 1; pairs := round(side*(side+1)/2); showdata(fout,'*'); showdata(output,' '); (* add on description lines about graphing *) (* old format: writeln(fout,'*information(i)| type of| | | |', ' freq. | 1 - freq. |3 blank|norm- |1 -'); writeln(fout,'*normal(n) | dinucl-| x | y |3 blank|', ' (information if | |alized|norm.'); writeln(fout,'*diagonal(d) | eotide |coord|coord|columns|', 'column is marked i)|columns|info. |info.'); *) writeln(fout,'*'); writeln(fout,'* Definition of columns:'); writeln(fout,'* 1: n (normal), d (diagonal), i (information)'); writeln(fout,'* 2: dinucleotide,', ' in (info, normal), id (info, diagonal)'); writeln(fout,'* 3: x coordinate'); writeln(fout,'* 4: y coordinate'); writeln(fout,'* 5: 1, use for xscolumn for xyplop'); writeln(fout,'* 6: 1, use for yscolumn for xyplop'); writeln(fout,'* 7: sum.data[position]'); writeln(fout,'* 8: Rcorrelation (with small sample correction)'); writeln(fout,'* 9: chisquare'); writeln(fout,'* 10: degrees of freedom'); writeln(fout,'* 11: 1, use for hue for xyplop'); writeln(fout,'* 12: 0, use for saturation for xyplop'); writeln(fout,'* 13: Rnormalized = Rcorrelation/4 or 1-(Rsequence/2)'); writeln(fout,'* 14: 1-Rnormalized'); writeln(fout,'* 15: correlation coefficient for x to y'); writeln(fout,'* 16: 1, use for saturation for xyplop'); { column 17: raw Rcorrelation/4 (no small sample correction) writeln(fout,'* 17: Rcorrelation/4 (with NO small sample correction)'); } end; (* end module diana.filehead *) (* ********************************************************************* *) (* ********************************************************************* *) (* begin module diana.addpair *) procedure addpair(xbase, ybase: base; xcoord, ycoord: integer; var dataprism: trisquare); (* add to the total at the triangle position at the correct dataprism spot *) var distance: integer; (* linear value corresponding to xcoord and ycoord *) begin distance := tripos(dataprism[xbase,ybase],xcoord,ycoord); with dataprism[xbase,ybase] do data[distance] := data[distance]+1; {zzzaaa} { writeln(output,'addpair (',xcoord:1,',',ycoord:1,') ', basetochar(xbase), basetochar(ybase)); } end; (* end module diana.addpair *) (* begin module diana.readbook *) procedure readbook(var apiece: pieceptr; var book: text; theline: integer; var dataprism: trisquare; fromparam, toparam: integer; fromrequest, torequest: integer; alignmenttype: char); (* go through the book and create the dataprism. Pull out pieces (apiece) and put the data from the requested region (fromrequest to torequest) into the dataprism. The book minimum-maximum range must be fromparam to toparam. When there is only a partial sequence, nothing is recorded. If alignmenttype is book, give an empty file to the align routine. *) var alignedbase: integer; (* the aligned base value of each piece in the book *) length: integer; (* the length of the sequence in the piece *) xbase: base; (* the base corresponding to the x coordinate *) x: integer; (* the actual x coordinate of the base to get *) xcoord: integer; (* the x coordinate loop control variable *) y: integer; (* the actual y coordinate of the base to get *) ybase: base; (* the base corresponding to the y coordinate *) ycoord: integer; (* the y coordinate loop control variable *) empty: text; (* empty file for alignment *) count: integer; (* count of sequences *) bk: char; (* backspace character *) begin (* initialize the variables *) xcoord := 0; ycoord := 0; (* clear the dataprism *) for xbase := a to t do for ybase := a to t do cleartriangle(dataprism[xbase,ybase],fromrequest,torequest); (* go through the book and analyze the pieces *) {writeln(output,'alignmenttype: ', alignmenttype);} rewrite(empty); reset(empty); write(output, ' inserting sequence: '); count := 0; bk := chr(8); (* backspace character *) while not eof(book) do begin {yyy} case alignmenttype of 'i': align(inst, book, theline, apiece,length,alignedbase); 'b': align(empty,book, theline, apiece,length,alignedbase); end; if not eof (book) then begin count := succ(count); { writeln(output,count:4, ' ', number:4); generally the same } write(output,' ',count:7,bk,bk,bk,bk,bk,bk,bk,bk); { if count = 10 then halt; (* use for testing backup counter *) writeln(output,bk,bk,bk,count:4); writeln(output,bk,bk,bk,bk,count:4); writeln(output,'readbook: alignedbase =',alignedbase:1); writeln(output,'readbook: fromrequest =',fromrequest:1); writeln(output,'readbook: torequest =', torequest:1); } for xcoord := fromrequest to torequest do begin {zzzrrr} x := xcoord+alignedbase; if (x >= 1) and (x <= length) then begin xbase := getbase(x,apiece); { write(output,'(1) x=',x:2, ' xbase = ',basetochar(xbase),' '); } addpair(xbase,xbase, xcoord,xcoord, dataprism); for ycoord := xcoord + 1 to torequest do begin y := ycoord+alignedbase; if (y >= 1) and (y <= length) then begin ybase := getbase(y,apiece); { write(output,'(2) y=',y:2, ' ybase = ',basetochar(ybase),' '); } addpair(xbase,ybase, xcoord,ycoord, dataprism); end; end; end; end; end; clearpiece(apiece); end; end; (* end module diana.readbook *) (* begin module diana.readprotseq *) procedure readprotseq(var protseq: text; var dataprism: trisquare; fromparam, toparam: integer; fromrequest, torequest: integer); (* go through the protseq and create the dataprism. Put the data from the requested region (fromrequest to torequest) into the dataprism. The sequence minimum-maximum range must be fromparam to toparam. *) const debugging = false; (* turn on sequence printing *) var letter: char; (* a letter from the sequence *) alignedbase: integer; (* the aligned base value of each piece in the book *) length: integer; (* the length of the sequence in the piece *) sequence: array[1..trimax] of char; (* the sequence *) xbase: base; (* the base corresponding to the x coordinate *) xcoord: integer; (* the x coordinate loop control variable *) ybase: base; (* the base corresponding to the y coordinate *) ycoord: integer; (* the y coordinate loop control variable *) done: boolean; (* done reading a sequence *) begin {writeln(output,'BUBBA 0 readprotseq');} (* initialize the variables *) xcoord := 0; ycoord := 0; alignedbase := fromrequest - fromparam + 1; (* clear the dataprism *) for xbase := a to t do for ybase := a to t do cleartriangle(dataprism[xbase,ybase],fromrequest,torequest); (* go through the book and analyze the pieces *) reset(protseq); while not eof(protseq) do begin (* read the next sequence *) while protseq^ = '*' do readln(protseq); (* skip header *) while protseq^ = '>' do readln(protseq); (* skip header *) length := 0; done := false; repeat if eof(protseq) then begin if debugging then writeln(output,' ',length:1); (* BUBBA 0.5 *) done := true; end else begin read(protseq, letter); {write(output,'BUBBA 0.1 ',length:1,' ',letter);} if (protseq^ in ['*','>','.']) then begin done := true; if debugging then writeln(output,' ',length:1); (* BUBBA 0.5 *) end else begin if letter = 'A' then letter := 'a'; if letter = 'C' then letter := 'c'; if letter = 'G' then letter := 'g'; if letter = 'T' then letter := 't'; if letter = 'U' then letter := 't'; if letter in ['a','c','g','t','-'] then begin sequence[length] := letter; length := succ(length); {writeln(output,'BUBBA 0.5 ',length:1,' ',letter); (* BUBBA *)} if debugging then write(output,letter); (* BUBBA *) end; end; if eoln(protseq) then readln(protseq); end; until done; { until (protseq^ = '.') or eoln(protseq); } {writeln(output,'BUBBA 3 readprotseq');} if not eof(protseq) then for xcoord := fromrequest to torequest do begin letter := sequence[xcoord+alignedbase]; {lll} if letter <> '-' then begin { writeln(output,'BUBBA 4 readprotseq'); writeln(output,'BUBBA 4.5 ',sequence[xcoord+alignedbase]); writeln(output,'BUBBA 4.5 ',letter); writeln(output,'BUBBA 4.5, xcoord+alignedbase:', xcoord+alignedbase: 5); writeln(output,'BUBBA 4.5, xcoord:', xcoord: 5); writeln(output,'BUBBA 4.5, alignedbase:', alignedbase: 5); writeln(output,'BUBBA 5 readprotseq'); } xbase := chartobase(letter); addpair(xbase,xbase,xcoord, xcoord,dataprism); for ycoord := xcoord + 1 to torequest do begin if sequence[ycoord+alignedbase] <> '-' then begin ybase := chartobase(sequence[ycoord+alignedbase]); addpair(xbase,ybase,xcoord, ycoord,dataprism); end; end; end; end; end; {writeln(output,'BUBBA 6 readprotseq');} end; (* end module diana.readprotseq *) (* begin module diana.totalprism *) procedure totalprism(dataprism: trisquare; var totalsum: trianglearray; fromparam, toparam: integer); (* count the total number of data elements at each position on the triangle *) var position: integer; (* index variable for the triangle data array *) xindex: base; (* x index variable for the dataprism array *) yindex: base; (* y index variable for the dataprism array *) begin (* clear the totalsum array *) cleartriangle(totalsum,1,(toparam - fromparam + 1)); (* go through the dataprism and sum each position *) for position := 0 to dataprism[a,a].area do for xindex := a to t do for yindex := a to t do with dataprism[xindex,yindex] do totalsum.data[position] := totalsum.data[position] + data[position]; end; (* end module diana.totalprism *) (* begin module diana.writeprism *) procedure writeprism(var dianalysis: text; dataprism: trisquare; fromparam, toparam: integer; sum: trianglearray; dicontrol: char; colorslope, colorconstant: real); (* write dataprism into file dianalysis in a form readable by xyplo.p. If dicontrol is 'b' do both info and data, if 'd' just data, and if 'i' just info. *) const bigoutput = false; (* give a big output for testing *) var baseindex: base; (* a loop variable for computing Rx and Ry values *) chisquare: real; (* the correlation coefficient at a position *) correlation: real; (* correlation coefficient for x to y *) dodata: boolean; (* print data *) doinfo: boolean; (* print info *) expected: real; (* calculated number of dinucleotides at a position *) freedom: integer; (* total number of degrees of freedom at a position*) Hx: real; (* the uncertainty of the single xbase at a positon *) Hy: real; (* the uncertainty of the single ybase at a positon *) Hxy: real; (* the uncertainty of the dinucleotides at a position *) freq: realprism; (* a variable to hold the frequencies of a triangle *) ln2: real; (* ln(2) precaulated for speed *) observed: real; (* actual number of dinucleotides at a position *) position: integer; (* the linear equivalent to triangular coordinates *) Rcorrelation: real; (* the correlation information at (Rx,Ry) *) Rcorrelationsum: real;(* sum of Rcorrelation *) Rc: real; (* Rcorrelation with small sample correction *) Rnormalized: real; (* the normalized information *) Rx: real; (* the information content of the single xbase *) Ry: real; (* the information content of the single ybase *) Rxy: real; (* the information content of the dinucleotides *) ssc4: real; (* small sample correction factor, precomputed. See Schneider1986 p. 427, eqn A6. s = 4. *) ssc16: real; (* small sample correction factor, precomputed. See Schneider1986 p. 427, eqn A6. s = 16. *) xbase: base; (* the x square loop control variable *) xfreedom: integer; (* number of degrees of freedom of the x position *) xfreq: basefreq; (* holds the frequency of a certain xbase *) xindex: integer; (* the x triangle loop control variable *) xsum: sumbase; (* an array of the sums of elements on the x-axis *) ybase: base; (* the y square loop control variable *) yfreedom: integer; (* number of degrees of freedom of the y position *) yfreq: basefreq; (* holds the frequency of a certain ybase *) yindex: integer; (* the y triangle loop control variable *) ysum: sumbase; (* an array of the sums of elements on the y-axis *) begin ssc4 := ( 4 - 1) / (2 * ln(2)); ssc16 := (16 - 1) / (2 * ln(2)); if dicontrol = 'b' then begin dodata := true; doinfo := true; end else if dicontrol = 'd' then begin dodata := true; doinfo := false; end else if dicontrol = 'i' then begin dodata := false; doinfo := true; end; Rcorrelationsum := 0.0; ln2 := ln(2); for xindex := fromparam to toparam do begin for yindex := xindex to toparam do begin (* set variables to zero *) Hxy := 0.0; Hx := 0.0; Hy := 0.0; Rxy := 0.0; Rx := 0.0; Ry := 0.0; for baseindex := a to t do begin xsum[baseindex] := 0; ysum[baseindex] := 0; end; (* transverse the dinucleotide matrix at the given position *) for xbase := a to t do begin for ybase := a to t do begin (* find the proper position in the triarray *) position := tripos(dataprism[xbase,ybase],xindex,yindex); (* calculate sums and frequencies to write out everything *) with dataprism[xbase,ybase] do begin xsum[xbase] := xsum[xbase] + data[position]; ysum[ybase] := ysum[ybase] + data[position]; { write (output,'xindex=',xindex:3,', '); write (output,'yindex=',yindex:3,', '); writeln(output,'sum.data[',position:3,']=',sum.data[position]:1); } { if sum.data[position] = 0 then begin write (output,'-0-'); write (output,'xindex=',xindex:3,' '); write (output,'yindex=',yindex:3,' '); write (output,'sum.data[',position:3,']=0'); writeln(output,' ',basetochar(xbase), basetochar(ybase)); halt; end; } {zzzqqq} (* if statement check 2003 Jan 17 *) if sum.data[position] > 0 then begin freq[xbase,ybase] := data[position] / sum.data[position]; if (freq[xbase,ybase] <> 0.0) then Hxy := Hxy - (freq[xbase,ybase] * ln(freq[xbase,ybase])); end end; (* print the values to the output file *) if dodata then begin if (xindex <> yindex) then write(dianalysis,'n') else write(dianalysis,'d'); write(dianalysis,' ',basetochar(xbase),basetochar(ybase)); write(dianalysis,' ',xindex:1); write(dianalysis,' ',yindex:1); write(dianalysis,' 1'); (* xscolumn *) write(dianalysis,' 1'); (* yscolumn *) write(dianalysis,' ',dataprism[xbase,ybase].data[position]:1); write(dianalysis,' ',freq[xbase,ybase]:12:10); write(dianalysis,' ',1 - freq[xbase,ybase]:12:10); write(dianalysis,' 1'); (* degrees of freedom *) write(dianalysis,' 1'); (* hue *) write(dianalysis,' 0'); (* saturation *) write(dianalysis,' 0'); (* normalized information *) write(dianalysis,' 0'); (* 1 - normalized information *) write(dianalysis,' 0'); (* correlation *) write(dianalysis,' 1'); (* saturation alternative *) writeln(dianalysis); end; end; end; (* set some more variables to zero *) xfreedom := 0; yfreedom := 0; freedom := 0; {How could Rcorrelation be a GOOD value here but bad later} { if sum.data[position] = 0 then begin write (output,'-1-'); write (output,'Rcorrelation ',Rcorrelation:3,' '); write (output,'xindex ',xindex:3,' '); write (output,'yindex ',yindex:3,' '); writeln(output,'sum.data[',position:3,'] = 0'); end; } {zzzqqq} (* find the frequencies and uncertainties of the mononucleotides *) for baseindex := a to t do begin (* if statement check 2003 Jan 17 *) if sum.data[position] > 0 then begin (* first coordinate *) xfreq[baseindex] := xsum[baseindex] / sum.data[position]; if (xfreq[baseindex] > 0) then begin Hx := Hx - (xfreq[baseindex]*ln(xfreq[baseindex])); xfreedom := xfreedom + 1; end; (* second coordinate *) yfreq[baseindex] := ysum[baseindex] / sum.data[position]; if (yfreq[baseindex] > 0) then begin Hy := Hy - (yfreq[baseindex]*ln(yfreq[baseindex])); yfreedom := yfreedom + 1; end; end; end; (* calulate the number of degrees of freedom at a position *) freedom := (xfreedom - 1) * (yfreedom - 1); (* calculate the information contents of mono- and di-nucleotides *) (* The calculation should be: Rx := 2 - (Hx / ln(2)); Ry := 2 - (Hy / ln(2)); Rxy := 4 - (Hxy / ln(2)); Rcorrelation := Rxy - Rx - Ry; but this reduces to: *) Rcorrelation := (Hx + Hy - Hxy)/ln2; { if sum.data[position] = 0 then begin write (output,'-2-'); write (output,'Rcorrelation=',Rcorrelation:3,' '); write (output,'xindex=',xindex:3,' '); write (output,'yindex=',yindex:3,' '); write (output,'Hx=',Hx:5:3,' '); write (output,'Hy=',Hy:5:3,' '); write (output,'Hxy=',Hxy:5:3,' '); writeln(output,'sum.data[',position:3,'] = 0'); end; } {zzzqqq} (* calculate the chi-square and correlation coefficient *) chisquare := 0.0; for xbase := a to t do begin for ybase := a to t do begin expected := xfreq[xbase] * yfreq[ybase] * sum.data[position]; observed := dataprism[xbase,ybase].data[position]; if (expected <> 0) then chisquare := chisquare + (sqr(observed - expected)/expected); end; end; correlation := 0.0; (* for now *) if doinfo then begin (* write out the information content for the current position *) write(dianalysis,'i '); write(dianalysis,'i'); if bigoutput then begin write(output,'* n= ',sum.data[position]:1); write(output,'* Rcorrelation = ', Rcorrelation:infofield:infodecim); end; if xindex = yindex then begin (* on the diagonal, Rcorrelation is equal to Hx, so normalize to give the information content (without small sample correction) *) Rnormalized := 1.0 - (Rcorrelation/2.0); write(dianalysis,'d'); (* 2003 Jan 17 check added *) if sum.data[position] > 0 then begin Rc := Rcorrelation - (ssc4/sum.data[position]); Rcorrelationsum := Rcorrelationsum + Rc; end else begin; Rc := 0.0; end; if bigoutput then begin write(output,'* ssc4/n = ', (ssc4/sum.data[position]) :infofield:infodecim); write(output,'* Rcorrelation - ssc4/n = ', Rc :infofield:infodecim); end; end else begin Rnormalized := Rcorrelation/4.0; write(dianalysis,'n'); (* 2003 Jan 17 check added *) if sum.data[position] > 0 then begin Rc := Rcorrelation - (ssc16/sum.data[position]); Rcorrelationsum := Rcorrelationsum + Rc; end else begin; Rc := 0.0; end; { if sum.data[position] = 0 then begin write (output,'-3-'); write (output,'Rcorrelation ',Rcorrelation:3,' '); write (output,'xindex ',xindex:3,' '); write (output,'yindex ',yindex:3,' '); writeln(output,'sum.data[',position:3,'] = 0'); end; } {zzzqqq} if bigoutput then begin write(output,'* ssc16/n = ', (ssc16/sum.data[position]) :infofield:infodecim); write(output,'* Rcorrelation - ssc16/n = ', Rc :infofield:infodecim); end; end; if bigoutput then begin writeln(output); end; Rnormalized := colorslope * Rnormalized + colorconstant; write(dianalysis,' ',xindex:1); (* 3 *) write(dianalysis,' ',yindex:1); (* 4 *) write(dianalysis,' 1'); (* 5 *) write(dianalysis,' 1'); (* 6 *) write(dianalysis,' ',sum.data[position]:1); (* 7 *) write(dianalysis,' ', Rc:infofield:infodecim); (* 8 *) write(dianalysis,' ',chisquare:8:5); (* 9 *) write(dianalysis,' ',freedom:2); (* 10 *) write(dianalysis,' 1'); (* 11 *) write(dianalysis,' 0'); (* 12 *) write(dianalysis,' ',Rnormalized:infofield:infodecim); (* 13 *) write(dianalysis,' ',(1-Rc):infofield:infodecim); (* 14 *) write(dianalysis,' ',correlation:infofield:infodecim); (* 15 *) write(dianalysis,' 1'); (* saturation alternative *) (* 16 *) { write(dianalysis,' ', Rcorrelation/4:infofield:infodecim); (* 17 *) } writeln(dianalysis); end; end; end; writeln(output, 'sum of Rcorrelation ',Rcorrelationsum:10:8, ' bits'); writeln(dianalysis,'* sum of Rcorrelation ',Rcorrelationsum:10:8, ' bits'); end; (* end module diana.writeprism *) (* ********************************************************************* *) (* ********************************************************************* *) (* begin module checknumber *) function checknumber(var afile: text): boolean; (* check that there is a number next in the file. If not, return false. This is useful for protection when reading a parameter file. *) var ok: boolean; (* result of this check *) procedure conclude; begin writeln(output,'Including this character, the rest of the data line is:'); copyaline(afile,output); ok := false; end; begin ok := true; (* be optimistic *) if eof(afile) then begin ok := false; write (output,'A number was expected on a data line, but'); writeln(output,' the end of the file was found instead.'); end else begin skipblanks(afile); if eoln(afile) then begin write (output,'A number was expected on a data line, but'); writeln(output,' the end of the line was found instead.'); conclude; end; if not (afile^ in ['0','1','2','3','4','5','6','7','8','9','.','-','+']) then begin write (output,'A number was expected on a data line, but'); writeln(output,' the character "',afile^,'" was found instead.'); conclude; end; end; checknumber := ok end; (* end module checknumber version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module copyfile *) procedure copyfile(var fin, fout: text); (* copy the rest of file fin to fout *) begin while not eof(fin) do copyaline(fin, fout); end; (* end module copyfile version = 4.57; (@ of prgmod.p 2002 Sep 5 *) (* begin module diana.upgradeto201 *) procedure upgradeto201(var dianap: text); (* upgrade the dianap file to version 2.01: specify alignmenttype: i f: first base, i: inst, b: book alignment The parameters when the upgrade feature was installed (2002 Sep 5) were: ------ -10 10 fromdesired, todesired -10 10 fromparam, toparam (only for use with protseq) i dicontrol: 'd' just data, 'i' just info, 'b' both 0.84 0.16 colorslope and colorconstant gives yellow ranging up to red. diana version 1.88 and higher ------ *) const copylines = 4; (* the number of copied lines *) var internal: text; (* a place to hold the old dianap *) line: integer; (* a line to be worked with *) parameterversion: real; (* parameter version number *) begin parameterversion := 2.01; writeln(output, 'upgrading to version ',parameterversion:4:2,' ...'); (* copy alist to internal *) reset(dianap); { Don't do this! There IS no previous parameter version line! readln(dianap); (* skip old parameter version line *) } rewrite(internal); for line := 1 to copylines do copyaline(dianap, internal); (* write the NEW PARAMETER LINE *) writeln(internal, 'i f: first base, i: inst, b: book alignment'); (* finish the copy of alist to internal *) { Dump the previous stuff about copyfile(dianap, internal); } (* copy internal to alist *) reset(internal); rewrite(dianap); writeln(dianap,parameterversion:4:2,' version of', ' dianap that this parameter file is designed for.'); copyfile(internal, dianap); (* add the new material at the end: *) (* none to add *) reset(dianap); (* ready to start reading again *) end; (* end module diana.upgradeto201 *) (* begin module dianap.upgradeparameters *) procedure upgradeparameters(var dianap: text); (* make sure that the parameters are the latest spiffy version *) var parameterversion: real; (* parameter version number *) begin readln(dianap, parameterversion); if (round(100*parameterversion) < round(100*updateversion)) or (round(100*parameterversion) > round(100*versionupperbound)) then begin parameterversion := 1.88; (* assume this version *) writeln(output, '^GYou have an old parameter file!, version ', parameterversion:4:2,'!'); writeln(output, ' version = ', version:4:2); writeln(output, ' updateversion = ', updateversion:4:2); writeln(output, ' parameterversion = ', parameterversion:4:2); writeln(output, 'versionupperbound = ',versionupperbound:4:2); if parameterversion = 1.88 then upgradeto201(dianap); reset(dianap); readln(dianap, parameterversion); if parameterversion < updateversion then begin writeln(output, 'Sorry! I am unable to fully upgrade', ' your parameter file'); writeln(output, 'from version ', parameterversion:4:2, ' to version ', updateversion:4:2,'!'); writeln(output, 'Start from a fresh copy or edit this one.'); halt; end else writeln(output, '... upgrade successful!'); writeln(output, 'See this page for the new documentation:'); writeln(output, 'http://www.lecb.ncifcrf.gov/~toms/delila/diana.html'); writeln(output,'----------'); end; {qqq} end; (* end module dianap.upgradeparameters *) (* begin module diana.themain *) procedure themain(var book, inst, protseq, dianap, da: text); (* the main procedure of diana *) var apiece: pieceptr; (* the piece pointer used throughout the program *) alignmenttype: char; (* type of book alignment *) bookorprotseq: char; (* 'b' means sequence data comes from the book 'p' means sequence data comes from the protseq *) colorslope: real; (* value to multiply normalized information *) colorconstant: real; (* add to normalized information after colorslope *) dataprism: trisquare; (* the data structure for diana *) dicontrol: char; (* 'd': just data, 'i': just info, 'b': both printed *) fromparam: integer; (* the farthest aligned position back from 0 *) fromrequest: integer; (* requested coordinate to analyze from *) fromprotseq: integer; (* protseq first base *) theline: integer; (* current line in book *) toparam: integer; (* the farthest aligned position forward from 0 *) torequest: integer; (* requested coordinate to analyze to *) toprotseq: integer; (* protseq last base *) totalsum: trianglearray;(* the total number of elements at each position *) procedure readparameters; (* read in the parameters. 2002 Sep 5: install auto upgrade, lister.p is example *) var checkout: boolean; (* if true, all variable values are ok *) procedure cn; (* short version of call to check number *) begin checkout := checknumber(dianap); if not checkout then halt; (* avoid snowballing *) end; begin checkout := true; (* be optimistic *) reset(dianap); if not eof(dianap) then begin cn; upgradeparameters(dianap); cn; readln(dianap,fromrequest,torequest); cn; readln(dianap,fromprotseq,toprotseq); readln(dianap,dicontrol); cn; readln(dianap,colorslope, colorconstant); readln(dianap, alignmenttype); end else begin (* initialize the variables *) fromrequest := fromparam; torequest := toparam; fromprotseq := fromparam; toprotseq := toparam; dicontrol := 'b'; colorslope := 0.84; colorconstant := 0.16; (* gives yellow ranging up to red *) alignmenttype := 'i'; (* read from inst to get alignment *) end; if fromrequest > torequest then begin writeln(output,'fromrequest must be <= torequest'); halt end; if bookorprotseq = 'p' then begin fromparam := fromprotseq; toparam := toprotseq; end; end; (* readparameters *) begin (* announce the program *) writeln(output,' diana ',version:4:2); readparameters; (* determine range of sequences available *) reset(book); reset(protseq); (* without initial readln, eof is false for an empty file for the GPC compiler! *) readln(book); readln(protseq); theline := 0; if eof(book) and eof(protseq) then begin writeln(output,'either book or protseq must contain sequence data'); halt end; if (not eof(book)) and (not eof(protseq)) then begin writeln(output,'One of book or protseq must be empty'); writeln(output,'eof(book)', eof(book)); writeln(output,'eof(protseq)', eof(protseq)); halt end; if not eof(book) then begin (* reset the files and pointers *) brinit(book, theline); reset(inst); new(apiece); (* find the minimum and maximum values needed to analyze the book *) writeln(output, ' Using book'); writeln(output, ' getting the range from the book.'); maxminalignment(inst, book, theline, fromparam, toparam, alignmenttype); writeln(output, ' Book is from: ',fromparam:1,' to: ',toparam:1); bookorprotseq := 'b'; end; reset(book); reset(protseq); readln(protseq); (* required to test protseq eof by GPC! 2009 Apr 09 *) if not eof(protseq) then begin writeln(output, ' Using protseq'); writeln(output, ' Getting the range from the protseq.'); protseqrange(protseq,fromparam,toparam); (* writeln(output,' protseq is from: ',fromparam:1,' to: ',toparam:1); *) bookorprotseq := 'p'; end; (* force the requested range into that available *) if fromrequest < fromparam then begin writeln(output,'************** WARNING **************'); writeln(output,'Requested FROM parameter (',fromrequest:1,')'); writeln(output,'cannot be lower than the sequence FROM limit (', fromparam:1,'). Using sequence limit: ', fromparam:1); fromrequest := fromparam; end; if torequest > toparam then begin writeln(output,'requested TO parameter (',torequest:1,')'); writeln(output,'cannot be lower than the sequence TO limit (', toparam:1,'). Using sequence limit: ', toparam:1 ); torequest := toparam; end; (* halt; {zzz} *) (* head the output file *) rewrite(da); case bookorprotseq of 'b': filehead(book, da, fromrequest, torequest); 'p': filehead(protseq, da, fromrequest, torequest); end; writeln(output,' creating the dataprism'); (* go through the book and place each sequence on the prism *) case bookorprotseq of 'b': readbook(apiece, book, theline, dataprism,fromparam,toparam,fromrequest,torequest, alignmenttype); 'p': readprotseq(protseq,dataprism,fromparam,toparam,fromrequest,torequest); end; (* count the number of data points in each space of the dataprism *) writeln(output,' totaling the dataprism'); totalprism(dataprism,totalsum,fromrequest,torequest); (* print the dataprism into da in a form readable to xyplo.p *) writeln(output,' calculating and writing the dataprism to da'); writeprism(da,dataprism,fromrequest,torequest,totalsum, dicontrol, colorslope, colorconstant); end; (* end module diana.themain *) begin themain(book,inst,protseq,dianap,da); 1: end.