program cluster (clusterp, subind, inst, book, pairs, clumps, output); (* cluster: filter indana subindexes to find duplicate entries and then print the duplicate pair coordinates, lengths, and sequences into the output files Mike Stephens, 1992 *) label 1; (* end of program *) const (* begin module version *) version = 5.06; (* of cluster.p 1992 September 18 origin 1989 September 1 *) (* end module version *) (* begin module describe.cluster *) (* name cluster: cluster indana subindexes into groups of duplicate entries synopsis cluster(clusterp: in, subind: in, inst: in, book: in, pairs: out, clumps: out, output: out) files clusterp: The cluster parameter file that consists of the following: FIRST LINE 'y' turns the flag on, 'n' turns it off (debugging) allows one to look at raw data in the bags. The debugging flag controls the printing of the raw data above the regular output of the cluster program, which is created solely by procedure showRAWbag. This can then be compared with the data in the chart for correctness. Raw data consists of the series of coordinate pairs in the bag and the sides they are matched on. printed above the standard output structure. example: - ( 630, 69) R L ( 649, 88) - {20} {20} ************************************* | 630 663 HUMUK | ---------- | 34 HUMUPA | ---------- | 69 102 ************************************* It is important to note that the raw data will only appear in the pairs output file, and will not be written in clumps at all. This means that parameter 3, writepairs, must also be turned on for this flag to be effective. SECOND LINE 'y' turns the flag on, 'n' turns it off (showfragments) allows one to see pairs that are fragmented. The showfragments toggle controls printing the outputs of pairs with "imperfect" matches. That is, in some cases a repeating sequence will match in several frames, causing repeated sequence matching and producing a large list of coordinate pairs. This list can be shown if the parameter is turned on, but the statement "WARNING: sequence pairs are overmatched" will appear if it is turned off. The actual sequences will be shown in either case, so the comparison can always be done by hand by the user. The output is excessively long, but the sequences will be shown, so the comparison can be done by the user. example: 1 acggatcgtgtgtgtgtgtgtgtgtacgatcggatcgat 2 acggatcgtgtgtgtgtgtgtgtgtacgatcggatcgat These sequences will have matches between all of the 'gt' base pairs, resulting in an overwhelming number of matches. The maximum number of possible matches is found by taking the length of the sequences and dividing it by the value in the overmatched parameter (FIFTH LINE) times the number of instructions that match between any two pieces in the dbinst. This results in a maximum number of matches between any two pieces. Any pieces above this limit will can have their output completely shown or can generate a warning message (see showfragments, SECOND LINE). In addition to preventing the example case, showfragments will also prevent the display of any other case that may cause an excessive number of matches. THIRD LINE 'y' turns the flag on, 'n' turns it off (writepairs) controls the printing of the pairs output file. If writepairs is on, the original clustering pairlist will be printed into the output file pairs. If it is off, this file will not be printed. This parameter must be turned on to effectively use the debugging parameter (see FIRST LINE). FOURTH LINE 'y' turns the flag on, 'n' turns it off (writeclumps) controls printing of the clumps output file. If writeclumps is on, the original clustering pairlist will be sent through the clumping procedures. The output file clumps will contain the sequences involved in the matches on the pair in addition to the clumped version of the pairlist. The clumping process takes an excessive amount of time for very large files, since the program must traverse the entire pairlist to find all related pairs, then put the pairs on to the clumplist, then go through the book and find sequences to match every instruction in every pair of every clump. Although it is much easier to determine which pieces are true repeats through use of the clumps file, it is certainly possible to do so by simply using the pairs output file. FIFTH LINE any integer (matchparameter) is the number of matches to be allowed between two instructions. This can be determined by dividing the sequence length from the book by the minimum window size from the subindex, or a maximum number of matches between instructions can be set. An integer less than or equal to 0 will calculate maximum matches using the above method. Any number greater than 0 will be used as the new maximum matches. example: if the instructions call for the sequences piece1: get from 100 -50 to 100 +50; piece2: get from 200 -50 to 200 +50; The sequence length is 101. If the windowsize read from the subindex = 15, then 6 possible matches can occur between these two instructions (101 div 15 = 6). The TOTAL number of matches between two pieces is found by multiplying matchparameter by the number of instructions in a given pair. If a piece has more matches than this, it is considered to be overmatched, the bag will not be printed, and the statment 'WARNING: sequence pairs have too many matches.' will appear. Overmatched pairs can be printed using the showmatches parameter (see SECOND LINE). subind: a subindex from the indana program matching the inst and the book inst: a set of delila instructions that correspond to the book book: a delila book that contains the sequences being clumped pairs: the output list of paired sequences clumps: the output list of clumped sequences output: When errors occur, the program halts and produces an error message description Duplicate entries in the subind subindex are clustered into a unified list of pairs and copied to output files as sequence numbers, lengths, and sequence base pairs. Pairs are determined by the indana program, which delegates sequence similarities with an '*'. Cluster takes the subindex and shows the coordinate range and length of the similarity by pairs. The pairs file is a list of relationships between two sequences, the clumps file takes this list of pairs and groups related ones together. The seqalign modules of the program then access the book and get the corresponding sequences to print out with the instruction number and piece name. documentation none see also index.p, indana.p author R. Michael Stephens bugs None currently known. technical notes The read for the indana window size is based on the '[' character before the number in the subind heading. Any changes to indana that alter this format must be reflected in the getwindowsize procedure. *) (* end module describe.cluster *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module cluster.book.const *) (* constants needed for book manipulations *) dnamax = 101; (* length of dna arrays of the piece*) namelength = 20; (* maximum key name length *) linelength = 80; (* maximum line readable in book *) (* end module cluster.book.const version = 'delmod 6.58 90 Jan 2 tds/gds' *) type (* begin module book.type *) (* types needed for book manipulations *) chset = set of 'a'..'z'; (* types defined in book definition *) alpha = packed array[1..namelength] of char; (* this is not alfa *) (* name is a left justified string with blanks following the characters *) name = record letters: alpha; length: 0..namelength (* zero means an unspecified structure *) end; lineptr = ^line; line = record (* a line of characters *) letters: packed array [1..linelength] of char; length: 0..linelength; next: lineptr end; direction = (plus, minus, dircomplement, dirhomologous); configuration = (linear, circular); state = (on, off); header = record (* header of key *) keynam: name; (* key name of structure *) fulnam: lineptr; (* full name of structure *) note: lineptr (* note key *) end; (* base types *) base = (a,c,g,t); dnaptr = ^dnastring; dnarange = 0..dnamax; seq = packed array[1..dnamax] of base; dnastring = record part: seq; length: dnarange; next: dnaptr end; orgkey = record (* organism key *) hea: header; mapunit: lineptr (* genetic map units *) end; chrkey = record (* chromosome key *) hea: header; mapbeg: real; (* number of genetic map beginning *) mapend: real (* number of genetic map ending *) end; pieceptr = ^piece; piekey = record (* piece key *) hea: header; mapbeg: real; (* genetic map beginning *) coocon: configuration; (* configruation (circular/linear) *) coodir: direction; (* direction (+/-) relative to genetic map *) coobeg: integer; (* beginning nucleotide *) cooend: integer; (* ending nucleotide *) piecon: configuration; (* configruation (circular/linear) *) piedir: direction; (* direction (+/-) relative to coordinates *) piebeg: integer; (* beginning nucleotide *) pieend: integer; (* ending nucleotide *) end; piece = record key: piekey; dna: dnaptr end; reference = record pienam : name; (* name of piece referred to *) mapbeg : real; (* genetic map beginning *) refdir : direction; (* direction relative to coordinates *) refbeg : integer; (* beginning nucleotide *) refend : integer; (* ending nucleotide *) end; genkey = record (* gene key *) hea : header; ref : reference; end; trakey = record (* transcript key *) hea : header; ref : reference; end; markerptr = ^marker; markey = record (* marker key *) hea : header; ref : reference; sta : state; phenotype : lineptr; next : markerptr; end; marker = record key : markey; dna : dnaptr; end; (* end module book.type version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module cluster.type *) (* a bag is that part of a pair that contains the sequence coordinates *) bagptr = ^bag; (* a pointer to a bag *) bag = record leftcoord, (* coordinate of the first number in numberpair *) rightcoord: integer; (* coordinate of the second number in numberpair *) leftmatch, (* is there a matching coordinate on the left? *) rightmatch: boolean; (* is there a matching coordinate on the right? *) nextbag: bagptr; (* points to the next bag in the linked list *) end; (* an entry is a working copy of a pair used to find duplicate lists*) entryptr = ^entries; (* a pointer to an entry *) entries = record piece: integer; (* the piece number of the entry *) inst: integer; (* the sequence number in the entry *) coordinate: integer; (* the coordinate number of an entry *) nextentry: entryptr; (* pointer to the next entry on the list *) end; (* the numberptr is a multiple-use list of single integers *) numberptr = ^numberlist; (* a pointer to a numberlist *) numberlist = record number: integer; (* the number in the record on the list *) nextlist: numberptr; (* a pointer to the next element on the list *) end; (* pairs are complete doubles of data sets for duplicate sequences *) pairptr = ^numberpair; (* a pointer to a pair *) numberpair = record leftpiece, (* the first piece number in the pair *) rightpiece: integer; (* the second piece number in the pair *) leftseq, (* the pair instruction list of leftpiece *) rightseq: numberptr; (* the pair instruction list of rightpiece *) seqbag: bagptr; (* points to the bag for that pair *) nextpair: pairptr; (* points to the next numberpair *) end; (* ranges show the relationships between intruction and piece numbers *) rangeptr = ^rangerecord; (* a pointer to a range *) rangerecord = record instnumber: integer; (* the highest instruction number in the range *) singlescount: integer;(* the number of singels preceding the range *) nextrange: rangeptr; (* the pointer to the next range *) end; (* the clump is the final stage of clustering, consisting of a data *) (* cube with a list of clumps on the y-axis, the pairs within the clump*) (* on the x-axis, and the bags of the pairs in the clump on the z-axis *) clumpptr = ^clump; (* a pointer to a clump *) clump = record piecelist: numberptr; (* the piece numbers in the clump *) pairlist: pairptr; (* pointer to the related pairlist on the clump *) nextclump: clumpptr; (* pointer to the next clump in the list *) end; (* end module cluster.type *) var (* begin module book.var *) (* ************************************************************************ *) (* global variables needed for book manipulations *) (* free storage: *) freeline: lineptr; (* unused lines *) freedna: dnaptr; (* unused dnas *) readnumber: boolean; (* whether to read a number from the notes, or to read in the notes *) number: integer; (* the number of the item just read *) numbered: boolean; (* true when the item just read is numbered *) skipunnum: boolean; (* a control variable to allow skipping of un-numbered items in the book *) (* ************************************************************************ *) (* end module book.var version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module cluster.var *) book: text; (* the book to be aligned *) clumps: text; (* the clumped list output file *) clusterp: text; (* the parameter file *) debugging: boolean; (* debugging parameter control variable *) pairs: text; (* the paired list output file *) subind: text; (* the indana subindex used as input *) freebag: bagptr; (* root for the list of unused bag pointers *) freeclump: clumpptr; (* root for the list of unused clump pointers *) freeentry: entryptr; (* root for the list of unused entry pointers *) freepair: pairptr; (* root for the list of unused pair pointers *) freesequence: numberptr; (* root for the list of unused sequence pointers *) inst: text; (* delila instructions to align procedures *) matchparameter: integer; (* maximum number of matches allowed per inst *) showfragments: boolean; (* showfragments parameter control variable *) writepairs: boolean; (* writepairs parameter control variable *) writeclumps: boolean; (* wrteinsts parameter control variable *) (* end module cluster.var *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module copyaline *) procedure copyaline(var fin, fout: text); (* copy a line from file fin to file fout *) begin (* copyaline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); writeln(fout); end; (* copyaline *) (* end module copyaline version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module copylines *) function copylines(var fin, fout: text; n: integer): integer; (* copy n lines of file fin to file fout. the actual number of lines copied is returned. *) var index: integer; (* the current line number *) begin (* copylines *) index := 0; while (not eof(fin)) and (index < n) do begin copyaline(fin, fout); index := succ(index) end; copylines := index end; (* copylines *) (* end module copylines version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module skipblanks *) procedure skipblanks(var thefile: text); (* skip over blanks until a non-blank, or end of line, is found *) begin while (thefile^ = ' ') and not eoln(thefile) do get(thefile); end; procedure skipnonblanks(var thefile: text); (* skip over nonblanks until a blank, or end of line, is found *) begin while (thefile^ <> ' ') and not eoln(thefile) do get(thefile); end; (* end module skipblanks *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module package.align *) (* ************************************************************************ *) (* begin module package.getpiece *) (* ************************************************************************ *) (* begin module package.brpiece *) (* ************************************************************************ *) (* begin module book.basis *) (* procedures needed for book manipulations *) (* get procedures should be used for all linked lists of records *) procedure getline(var l: lineptr); (* obtain a line from the free line list or by making a new one *) begin if freeline<>nil then begin l:=freeline; freeline:=freeline^.next end else new(l); l^.length:=0; l^.next:=nil end; procedure getdna(var l: dnaptr); begin if freedna<>nil then begin l:=freedna; freedna:=freedna^.next end else new(l); l^.length:=0; l^.next:=nil end; (* clear procedures should be called each time the records are no longer needed failure to do this may result in a stack overflow. *) procedure clearline(var l: lineptr); (* return a line to the free line list *) var lptr: lineptr; begin if l<>nil then begin lptr:=l; l:=l^.next; lptr^.next:=freeline; freeline:=lptr end end; procedure cleardna(var l: dnaptr); var lptr: dnaptr; begin if l<>nil then begin lptr:=l; l:=l^.next; lptr^.next:=freedna; freedna:=lptr end end; procedure clearheader(var h: header); (* clear the header h (remove lines to free storage) *) begin with h do begin clearline(fulnam); while note<>nil do clearline(note) end end; procedure clearpiece(var p: pieceptr); (* clear the dna of the piece *) begin while p^.dna<>nil do cleardna(p^.dna); clearheader(p^.key.hea) end; function chartobase(ch:char):base; (* convert a character into a base *) begin case ch of 'a': chartobase:=a; 'c': chartobase:=c; 'g': chartobase:=g; 't': chartobase:=t end end; function basetochar(ba:base):char; (* convert a base into a character *) begin case ba of a: basetochar:='a'; c: basetochar:='c'; g: basetochar:='g'; t: basetochar:='t'; end end; function complement(ba:base):base; (* take the complement of ba *) begin case ba of a: complement:=t; c: complement:=g; g: complement:=c; t: complement:=a; end end; function pietoint(p: integer; pie: pieceptr): integer; (* p is a coordinate on the piece. we want to transform p into a number from 1 to n: an internal coordinate system for easy manipulation of piece coordinates *) var i: integer; (* an intermediate value *) begin with pie^.key do begin case piedir of plus: if p>=piebeg then i:=p-piebeg+1 else i:=(p-coobeg)+(cooend-piebeg)+2; minus: if p<=piebeg then i:=piebeg-p+1 else i:=(cooend-p)+(piebeg-coobeg)+2 end; pietoint:=i end end; function inttopie(i: integer; pie: pieceptr):integer; (* i is in the range 1 to some maximum. it is an internal coordinate system for the program. we want to do a coordinate transformation to obtain a value in the range of the piece called pie: i=1 corresponds to piebeg and i=its maximum corresponds to pieend *) var p: integer; (* an intermediate value *) begin with pie^.key do begin case piedir of plus: begin p:=piebeg+(i-1); if p>cooend then if coocon=circular then p:=p-(cooend-coobeg+1) end; minus: begin p:=piebeg-(i-1); if p '*' then begin writeln(output,' procedure skipstar: bad book'); writeln(output,' "*" expected as first character on the line, but "', thefile^,'" was found'); halt end; get(thefile); (* skip the star *) if thefile^ <> ' ' then begin writeln(output,' procedure skipstar: bad book'); writeln(output,' "* " expected on a line but "*', thefile^,'" was found'); halt end; get(thefile) (* skip the blank *) end; (* skipstar *) (* end module book.skipstar version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brreanum *) procedure brreanum(var thefile: text; var reanum: real); (* read a real number from the file *) begin skipstar(thefile); readln(thefile,reanum); end; (* end module book.brreanum version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brnumber *) procedure brnumber(var thefile: text; var num: integer); (* read a number from the file *) begin skipstar(thefile); readln(thefile,num) end; (* end module book.brnumber version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brname *) procedure brname(var thefile: text; var nam: name); (* read a name from the file *) var i: integer; (* an index to the name *) c: char; (* a character read *) begin (* brname *) skipstar(thefile); with nam do begin length:=0; repeat length:=succ(length); read(thefile,c); letters[length] := c until (eoln(thefile)) or (length>=namelength) or (letters[length]=' '); if letters[length]=' ' then length:=length-1; if length 'n' then begin skipstar(thefile); if not eoln(thefile) then begin if thefile^ = '#' then begin numbered := true; get(thefile); (* move past the number symbol *) read(thefile,number); end end; repeat readln(thefile) until thefile^ = 'n'; readln(thefile) end else readln(thefile) end end; (* brnotenumber *) (* end module book.brnotenumber version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brnote *) procedure brnote(var thefile: text; var note: lineptr); (* read note key *) var newnote: lineptr; (* the new note *) previousnote: lineptr; (* the last line of the notes *) begin (* brnote *) note:=nil; if thefile^ = 'n' then begin (* enter note *) readln(thefile); if thefile^ <> 'n' then begin (* abort null note (n/n) *) getline(note); newnote:=note; while thefile^ <> 'n' do begin (* wait until end of note *) brline(thefile,newnote); previousnote:=newnote; (* get next note *) getline(newnote^.next); newnote:=newnote^.next; end; (* last note was not used, so: *) clearline(newnote); previousnote^.next:=nil; readln(thefile) end else readln(thefile) end end; (* brnote *) (* end module book.brnote version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brheader *) procedure brheader(var thefile: text; var hea: header); (* read the header of a key. *) begin with hea do begin (* read key name *) brname(thefile,keynam); (* read full name *) getline(fulnam); brline(thefile,fulnam); (* read note key *) if readnumber then brnotenumber(thefile,note) else brnote(thefile,note) end end; (* end module book.brheader version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.copyheader *) procedure copyheader(fromhea: header; var tohea: header); (* copy the header fromhea into tohea. Note that the linked objects are NOT copied, but merely pointed to. *) begin tohea.keynam.letters := fromhea.keynam.letters; tohea.keynam.length := fromhea.keynam.length; tohea.note := fromhea.note; tohea.fulnam := fromhea.fulnam; end; (* end module book.copyheader version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brpiekey *) procedure brpiekey(var thefile: text; var pie: piekey); (* read piece key *) begin with pie do begin brheader(thefile,hea); brreanum(thefile,mapbeg); brconfig(thefile,coocon); brdirect(thefile,coodir); brnumber(thefile,coobeg); brnumber(thefile,cooend); brconfig(thefile,piecon); brdirect(thefile,piedir); brnumber(thefile,piebeg); brnumber(thefile,pieend); end end; (* end module book.brpiekey version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brdna *) procedure brdna(var thefile: text; var dna: dnaptr); (* read in dna from thefile *) (* note: if the dna were circularized, by linking the last dnastring to the first, then the cleardna routine could not clear properly, and would loop forever... there is no reason to do that, since a simple mod function will allow one to access the circle. *) var ch: char; workdna: dnaptr; begin getdna(dna); workdna:=dna; ch:=getto(thefile,['d']); read(thefile,ch); (* skipstar *) while (ch = '*') do begin read(thefile,ch); (* skip blank *) repeat read(thefile,ch); if ch in ['a','c','g','t'] then begin if workdna^.length=dnamax then begin getdna(workdna^.next); workdna:=workdna^.next end; workdna^.length:=succ(workdna^.length); workdna^.part[workdna^.length]:=chartobase(ch) end until eoln(thefile); readln(thefile); (* go to next line *) read(thefile,ch); (* ch is either '*' or 'd' *) end; readln(thefile) end; (* end module book.brdna version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brpiece *) procedure brpiece(var thefile: text; var pie: pieceptr); (* read in a piece *) begin brpiekey(thefile,pie^.key); if numbered or (not skipunnum) then brdna(thefile,pie^.dna) end; (* end module book.brpiece version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.brinit *) procedure brinit(var book: text); (* check that the book is ok to read, and set up the global variables for br routines *) begin (* brinit *) (* halt if the book is bad (first word is 'halt') or the first character is not * *) reset(book); if not eof(book) then begin (* check for the date line *) if book^ <> '*' then begin if book^ <> 'h' then writeln(output, ' this is not the first line of a book:') else writeln(output, ' bad book:'); write(output, ' '); while not (eoln(book) or eof(book)) do begin write(output, book^); get(book) end; writeln(output); halt end end else begin writeln(output, ' book is empty'); halt end; (* initialize free storage *) freeline:=nil; freedna:=nil; readnumber:=true; (* usually we read in numbers for items *) number:=0; (* arbitrary value *) numbered:=false; (* the piece has no number (none yet read in) *) skipunnum:=false; end; (* brinit *) (* end module book.brinit version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* end module package.brpiece version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.getpiece *) procedure getpiece(var thefile: text; var pie: pieceptr); (* move to and read in the next piece in the book *) var ch: char; begin ch:=getto(thefile,['p']); (* get to the next p(iece) in the book *) if ch<>' ' then begin brpiece(thefile,pie); ch:=getto(thefile,['p']); (* read past closing p *) end end; (* end module book.getpiece version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* end module package.getpiece version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module findblank *) procedure findblank(var afile: text); (* read a file to find the next blank character *) var ch: char; begin repeat read(afile,ch) until ch = ' ' end; (* end module findblank version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module findnonblank *) procedure findnonblank(var afile: text; var ch: char); (* find the next non blank character in a file, return it in ch. *) begin ch:=' '; while (not eof(afile)) and (ch = ' ') do begin read(afile,ch); if eoln(afile) then readln(afile) end end; (* end module findnonblank version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module align.align *) procedure align(var inst, book: text; var pie: pieceptr; var length, alignedbase: integer); (* documentation on align is in module info.align and delman.use.aligned.books *) const maximumrange = 500; (* if the alignment point is more than this distance from the piece ends, the program halts in an attempt to catch the alignment bug... 1991 Jan 11 It appears that the rewrite of the code has removed the bug, but the check will be kept. *) var ch: char; (* a character in inst *) comment: boolean; (* true means we are inside a comment *) done: boolean; (* done finding an aligning get *) thebase: integer; (* the base read in *) begin if not eof(book) then begin (* if there is still more to the book ... *) getpiece(book,pie); (* read in the piece *) if not eof(book) then begin (* if we found a piece ... *) length:=pietoint(pie^.key.pieend,pie); (* calculate piece length *) (* now find inst the next occurance of 'get' *) done := false; while not done do begin if eof(inst) then begin (* no instructions? *) alignedbase := 1; (* simply align by the first base *) done := true end else begin if inst^ = '(' then begin (* skip comment *) get(inst); if inst^ = '*' then comment := true; {if comment then write(output,'COMMENT: (');} while comment do begin if eof(inst) then begin writeln(output,' in procedure align:'); writeln(output,' an instruction comment does not end!'); halt end; {write(output,inst^);} get(inst); if inst^ = '*' then begin get(inst); {if inst^ = ')' then writeln(output,'*)');} if inst^ = ')' then comment := false end; end; end; if inst^ = 'g' then begin get(inst); if inst^ = 'e' then begin get(inst); if inst^ = 't' then begin get(inst); if inst^ = ' ' then begin findnonblank(inst,ch); (* get to "from" *) findblank(inst); (* get past "from" *) read(inst,thebase); (* read in the alignedbase *) {writeln(output,'thebase=',thebase:1);} alignedbase:=pietoint(thebase,pie); done := true end; end; end; end; get(inst); (* move along now *) end; end; if (alignedbase <= -maximumrange) or (alignedbase > length + maximumrange) then begin writeln(output,' in procedure align:'); writeln(output,' read in base was ',thebase:1); writeln(output,' in internal coordinates: ',alignedbase:1); writeln(output,' maximum range was ',maximumrange:1); writeln(output,' piece length was ',length:1); with pie^.key.hea.keynam do writeln(output,' piece name: ',letters:length); writeln(output,' piece number: ',number:1); writeln(output,' aligned base is too far away... see the code'); halt end end end end; (* end module align.align version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* end module package.align version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module align.maxminalignment *) procedure maxminalignment(var inst, book: text; var fromparam, toparam: integer); (* prescan the book to find the range over which the pieces of the book are spread, relative to the aligned base. the procedure uses the same variables that align does (so it can call align itself), and it returns the range in fromparam and toparam. *) const maximumrange = 500; (* the maximum size aligned piece; this will presumably catch the alignment bug *) var distance: integer; (* a distance to the aligned base *) pie: pieceptr; length, alignedbase: integer; begin new(pie); (* set an initial range for the two bounds *) fromparam:=+maxint; toparam:=-maxint; reset(book); reset(inst); while not eof(book) do begin align(inst,book,pie,length,alignedbase); if not eof(book) then begin distance:=1-alignedbase; if fromparam > distance then fromparam:=distance; distance:=length-alignedbase; if toparam < distance then toparam:=distance; clearpiece(pie) end end; if toparam - fromparam > maximumrange then begin writeln(output,' in procedure maxminalignment:'); writeln(output,' fromparameter = ',fromparam:1); writeln(output,' toparameter = ',toparam:1); writeln(output,' this exceeds the maximum range allowed (', maximumrange:1,')'); writeln(output,' see notes in the procedure. '); halt (* notes: if you desired this range, increase 'maximumrange'. otherwise, this may indicate a bug - either: 1) locate the bug (and tell tom schneider, please...) 2) reduce the size of the fragments, from one or the other end until the bombing is stopped. *) end; (* make the book readable again *) reset(book); reset(inst); dispose(pie) end; (* end module align.maxminalignment version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module align.withinalignment *) function withinalignment(alignedposition, alignedbase, length: integer) :boolean; (* this function tells one if an aligned position, relative to an aligned base in a piece of some length is within the piece. *) var p: integer;(* the position on the piece *) begin p := alignedposition + alignedbase; withinalignment := (p>0) and (p<=length) end; (* end module align.withinalignment version = 'delmod 6.65 94 sep 5 tds/gds' *) (* begin module book.getbase *) function getbase(position: integer; pie: pieceptr):base; (* get a base from the nth position (internal coordinates) of the piece. no protection is made against positions outside the piece *) var workdna: dnaptr; p: integer; (* the last base of the dna part *) begin workdna:=pie^.dna; p:=dnamax; while position>p do begin p:=p+dnamax; workdna:=workdna^.next end; getbase:=workdna^.part[position-(p-dnamax)] end; (* end module book.getbase version = 'delmod 6.65 94 sep 5 tds/gds' *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module package.numbar *) (* ************************************************************************ *) (* begin module numberdigit *) function numberdigit(number, logplace:integer): char; (* return the digit at the place value ('logplace') position of number. example: numberdigit(13625, 3) = 3 numberdigit(13625, 4) = 1 *) var place: integer; (* the exponent of logplace *) count: integer; (* used to make place *) absolute: integer; (* the absolute value of number *) acharacter: char; (* the character to be returned *) (* ******************************************************************** *) (* NESTED PROCEDURES *) procedure digit; (* extract a digit at the place position *) var tenplace: integer; (* ten times place *) z: integer; (* an intermediate value *) d: integer; (* the digit extracted *) begin (* digit *) tenplace := 10 * place; z := absolute -((absolute div tenplace) * tenplace); if place = 1 then d:=z else d:= z div place; case d of 0: acharacter:='0'; 1: acharacter:='1'; 2: acharacter:='2'; 3: acharacter:='3'; 4: acharacter:='4'; 5: acharacter:='5'; 6: acharacter:='6'; 7: acharacter:='7'; 8: acharacter:='8'; 9: acharacter:='9'; end; end; (* digit *) procedure sign; (* put a negative sign out or a positive sign *) begin (* sign *) if number <0 then acharacter:='-' else acharacter:='+'; end; (* sign *) (* END NESTED PROCEDURES *) (* ******************************************************************** *) begin (* numberdigit *) place:=1; for count:=1 to logplace do place:=10*place; if number=0 then begin if place=1 then acharacter:='0' else acharacter:=' '; end else begin absolute:=abs(number); if absolute < (place div 10) then acharacter:=' ' else if absolute >= place then digit else sign; end; numberdigit:=acharacter end; (* numberdigit *) (* end module numberdigit version = 'prgmod 3.96 85 mar 18 tds'; *) (* begin module numbersize *) function numbersize(n: integer):integer; (* calculate amount of space to be reserved for the integer n *) const ln10 = 2.30259; (* natural log of 10 - for conversion to log base 10 *) epsilon = 0.00001; (* a small number to correct log base 10 errors *) begin (* numbersize *) if n = 0 then numbersize:=1 else numbersize:=trunc(ln(abs(n))/ln10 + epsilon) + 2; (* the 2 is for the sign and last digit *) (* the epsilon assures that we do not lose a place due to roundoff. eg, sometimes log base 10 of 10 would be 0.9999 instead of 1, and we would not do it right... note: this will fail for very large numbers on the order of 1/epsilon. *) end; (* numbersize *) (* end module numbersize version = 'prgmod 3.96 85 mar 18 tds'; *) (* begin module numberbar *) procedure numberbar(var afile: text; spaces, firstnumber, lastnumber: integer; var linesused: integer); (* write a bar of numbers to a file, with several spaces before. the number of lines used is returned *) var logplace: integer; (* the log of the digit being looked at *) spacecount: integer; (* count of spaces *) number: integer; (* the current number being written *) begin if abs(firstnumber) > abs(lastnumber) then linesused:= numbersize(firstnumber) else linesused:= numbersize(lastnumber); for logplace:=linesused-1 downto 0 do begin for spacecount:=1 to spaces do write(afile,' '); for number:=firstnumber to lastnumber do write(afile,numberdigit(number,logplace)); writeln(afile); end; end; (* end module numberbar version = 'prgmod 3.96 85 mar 18 tds'; *) (* ************************************************************************ *) (* end module package.numbar version = 1.11; *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module writepiename *) procedure writepiename(var thefile: text; pie: pieceptr; namespace: integer); (* write the piece name to the file, left justified in a field of namespace *) var index: integer; (* index to the field *) begin with pie^ do for index := 1 to namespace do if index <= key.hea.keynam.length then write(thefile,key.hea.keynam.letters[index]) else write(thefile,' '); end; (* end module writepiename *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module seqalign.writeseqline *) procedure writeseqline(var fout: text; apiece: pieceptr; length, alignedbase, fromparam, toparam, number: integer); (* write the sequence number, locus name, and sequence to the fout file *) const namespace = 10; (* the width of the field to write the name in *) width = 10; (* the width of the field to write coordinates in *) var index: integer; (* a loop control variable *) begin (* write the number of the sequence *) write(fout,number:5,' '); (* write the locus name of the sequence *) writepiename(fout,apiece,namespace); write(fout,' '); (* write the starting coordinate of the sequence *) write(fout,' ',inttopie(1,apiece):width,' '); (* write the sequence *) for index := fromparam to toparam do begin if withinalignment(index,alignedbase,length) then write(fout,basetochar(getbase(index+alignedbase,apiece)):1) else write(fout,' '); end; (* write the ending coordinate of the sequence *) write(fout,' ',inttopie((toparam - fromparam + 1),apiece):1); (* add the carriage return at the end of the line *) writeln(fout); end; (* end module seqalign.writeseqline *) (* begin module seqalign.dosequence *) procedure dosequence(var apiece: pieceptr; var book, inst: text; piecenumber: integer; var alignedbase, length: integer); (* find the piece corresponding to piecenumber and return it in apiece. *) var foundpiece: boolean; (* did we locate the number we were supposed to? *) linenumber: integer; (* number of the line down the page we are on *) begin (* reset the files to make sure that they match *) reset(book); reset(inst); (* search the book for the sequence piecenumber, and write it out *) foundpiece := false; while not foundpiece do begin clearpiece(apiece); (* clear the piece for reuse *) align(inst,book,apiece,length,alignedbase); if not eof(book) then if numbered then if number <> piecenumber then linenumber := succ(linenumber) else foundpiece := true else if not foundpiece then begin if (piecenumber > number) or (piecenumber < 1) then begin writeln(clumps,'The requested sequence is not in book range.'); foundpiece := true; end; end; end; end; (* end module seqalign.dosequence *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module cluster.readparameters *) procedure readparameters(var clusterp: text); (* read the parameters from clusterp *) var letter1: char; (* a character on the line we are on *) begin reset(clusterp); if not eof(clusterp) then begin (* read the debugging parameter (first line) *) readln(clusterp,letter1); if (letter1 = 'y') then debugging := true else debugging := false; (* read the showfragments parameter (second line) *) readln(clusterp,letter1); if (letter1 = 'y') then showfragments := true else showfragments := false; (* read the writepairs parameter (third line) *) readln(clusterp,letter1); if (letter1='y') then writepairs := true else writepairs := false; (* read the writeclumps parameter (fourth line) *) readln(clusterp,letter1); if (letter1='y') then writeclumps := true else writeclumps := false; (* read the matchparameter (fifth line) *) readln(clusterp,matchparameter); end else begin writeln(output,'The parameter file clusterp is empty.'); writeln(output,'Program halt.'); halt; end; end; (* end module cluster.readparameters *) (* begin module cluster.writeparameters *) procedure writeparameters(var fout: text); (* write the parameter conditions to the fout file *) procedure onoff(toggle: boolean); (* add the on or off to the parameter line *) begin (* onoff *) if toggle then writeln(fout,'on') else writeln(fout,'off'); end; (* onoff *) begin (* writeparameters *) (* write the condition of the debugging parameter *) write(fout,'* debugging '); onoff(debugging); (* write the condition of the showfragments parameter *) write(fout,'* showfragments '); onoff(showfragments); (* write the condition of the writepairs parameter *) write(fout,'* writepairs '); onoff(writepairs); (* write the condition of the writeclumps parameter *) write(fout,'* writeclumps '); onoff(writeclumps); (* write the number of the matchparameter *) if matchparameter < 0 then matchparameter := 0; write(fout,'* matchparameter is set at ',matchparameter:1); (* add some spaces to set them apart from the cluster output *) writeln(fout); writeln(fout); end; (* writeparameters *) (* end module cluster.writeparameters *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module cluster.getwindowsize *) procedure getwindowsize(var fin: text; var windowsize: integer); (* read the windowsize from the indana subindex for later use *) var acharacter: char; (* a character in the fin file *) begin (* move to the windowsize number *) repeat read(fin,acharacter) until acharacter = '['; (* read it *) read(fin,windowsize); end; (* end module cluster.getwindowsize *) (* begin module cluster.filehead *) procedure filehead (var fin, fout: text); (* place the heading lines on the fout file *) var copynumber: integer; (* the number of lines being copied from the text *) begin rewrite(fout); (* write the name and version of the program *) writeln(fout,'cluster ',version:4:2); (* copy the identification lines to the fout *) reset(fin); copynumber := 12; if copylines(fin,fout,copynumber) <> copynumber then begin writeln(fout,'subind does not have enough header lines'); halt end; (* skip the next line of fin to avoid the headings of the subindex data *) readln(fin); end; (* end module cluster.filehead *) (* begin module cluster.readdata *) procedure readdata (var fin: text; var seqnumber, coord: integer; var lastchar: char); (* reads the necessary data elements from a line in the subind file *) begin {writeln(output,'in readdata');} {copyaline(fin,output);} (* skip data to the sequence number *) skipblanks(fin); {zzz} {writeln(output,'about to read this character: ',fin^);} skipnonblanks(fin); skipblanks(fin); (* read the number *) {writeln(output,'about to read this character: ',fin^);} read(fin,seqnumber); read(fin,coord); (* move to and get the last character of the line *) while not eoln(fin) do read(fin,lastchar); (* put in a carriage return to move to the next line *) readln(fin); end; (* end module cluster.readdata *) (* ********************************************************************* *) (* ********************************************************************* *) (* begin module cluster.clearclump *) procedure clearclump(var a: clumpptr); (* initialize the variables in a clump record *) begin with a^ do begin piecelist:= nil; pairlist := nil; nextclump := nil; end end; (* end module cluster.clearclump *) (* begin module cluster.clearpair *) procedure clearpair(var a: pairptr); (* initialize the variables in a pair record *) begin with a^ do begin leftpiece:= 0; rightpiece:= 0; leftseq:= nil; rightseq:= nil; seqbag := nil; nextpair := nil; end end; (* end module cluster.clearpair *) (* begin module cluster.clearbag *) procedure clearbag(var a: bagptr); (* initialize the variables in a bag record *) begin with a^ do begin leftcoord := 0; rightcoord := 0; leftmatch := false; rightmatch := false; nextbag := nil; end end; (* end module cluster.clearbag *) (* begin module cluster.clearentry *) procedure clearentry(var a: entryptr); (* initialize the variables in an entry record *) begin with a^ do begin piece := 0; number := 0; coordinate := 0; nextentry := nil; end end; (* end module cluster.clearentry *) (* begin module cluster.clearrange *) procedure clearrange(var a: rangeptr); (* initialize the variables of a range record *) begin with a^ do begin number := 0; singlescount := 0; nextrange := nil; end; end; (* end module cluster.clearrange *) (* begin module cluster.clearnumberlist *) procedure clearnumberlist (var a: numberptr); (* initialize the variables in a sequence record *) begin with a^ do begin number := 0; nextlist := nil; end; end; (* end module cluster.clearnumberlist *) (* begin module cluster.getclump *) procedure getclump (var a: clumpptr); (* gets a clump off the list of free clump records *) begin if freeclump = nil then new(a) else begin a := freeclump; freeclump := freeclump^.nextclump; end; clearclump(a); end; (* end module cluster.getclump *) (* begin module cluster.getpair *) procedure getpair (var a: pairptr); (* gets a pair off the list of free pair records *) begin if freepair = nil then new(a) else begin a := freepair; freepair := freepair^.nextpair; end; clearpair(a); end; (* end module cluster.getpair *) (* begin module cluster.getbag *) procedure getbag (var a: bagptr); (* gets a bag off the list of free bag records *) begin if freebag = nil then new(a) else begin a := freebag; freebag := freebag^.nextbag; end; clearbag(a); end; (* end module cluster.getbag *) (* begin module cluster.getentry *) procedure getentry (var a: entryptr); (* gets an entry off the list of free entry records *) begin if freeentry = nil then new(a) else begin a := freeentry; freeentry := freeentry^.nextentry; end; clearentry(a); end; (* end module cluster.getentry *) (* begin module cluster.getnumberlist *) procedure getnumberlist (var a: numberptr); (* gets a sequence off the list of free sequence records *) begin if freesequence = nil then new(a) else begin a := freesequence; freesequence := freesequence^.nextlist; end; clearnumberlist(a); end; (* end module cluster.getnumberlist *) (* begin module putclump *) procedure putclump (a: clumpptr); (* place a clump on the list of free pair records *) begin clearclump(a); a^.nextclump := freeclump; freeclump := a; end; (* end module putclump *) (* begin module putpair *) procedure putpair (a: pairptr); (* place a pair on the list of free pair records *) begin clearpair(a); a^.nextpair := freepair; freepair := a; end; (* end module putpair *) (* begin module putbag *) procedure putbag (a: bagptr); (* place a bag on the list of free bag records *) begin clearbag(a); a^.nextbag := freebag; freebag := a; end; (* end module putbag *) (* begin module putentry *) procedure putentry (a: entryptr); (* places an entry on the list of free entry records *) begin clearentry(a); a^.nextentry := freeentry; freeentry := a; end; (* end module putentry *) (* begin module putnumberlist *) procedure putnumberlist (a: numberptr); (* places a sequence on the list of free sequence records *) begin clearnumberlist(a); a^.nextlist := freesequence; freesequence := a; end; (* end module putnumberlist *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module cluster.findranges *) function findranges (instnumber: integer; list: rangeptr): integer; (* returns the piecenumber that corresponds to instnumber *) var current: rangeptr; (* the current pointer position on the list *) foundpiecenumber: boolean; (* are we in the right range? *) instcounter: integer; (* a running count of instruction numbers *) piececounter: integer; (* a running count of piece numbers *) begin current := list; piececounter := 1; instcounter := 0; foundpiecenumber := false; repeat if (instnumber <= current^.instnumber) then begin foundpiecenumber := true; if (instnumber > (instcounter + current^.singlescount)) then findranges := piececounter else findranges := piececounter - (current^.singlescount + 1) + (instnumber - instcounter); end else begin instcounter := current^.instnumber; current := current^.nextrange; if current <> nil then piececounter := piececounter + current^.singlescount + 1 else writeln(output,'(would have bombed!)'); end; until foundpiecenumber or (current = nil); end; (* end module cluster.findranges *) (* begin module cluster.addranges *) procedure addrange(number, piecenumber: integer; var numberlist: rangeptr); (* add a piecenumber and the singlesnumber to the numberlist *) var current: rangeptr; (* the pointer position before next *) last: rangeptr; (* the pointer position before current *) next: rangeptr; (* the working position on the numberlist *) piececounter: integer; (* keeps track of the piecenumber *) begin (* if the list is nil make a new one *) if numberlist = nil then begin new(numberlist); clearrange(numberlist); numberlist^.instnumber := number; end else begin (* initalize the counter and the pointers *) piececounter := 0; last := numberlist; current := numberlist; next := numberlist; (* get to the end of the list *) while next <> nil do begin last := current; current := next; piececounter := piececounter + next^.singlescount + 1; next := next^.nextrange; end; (* decide if we need a new range or not *) if (piecenumber = piececounter) then current^.instnumber := number else begin if (number-(last^.instnumber+current^.singlescount)=2) then begin current^.instnumber := number; current^.singlescount := current^.singlescount + 1; end else begin new(next); clearrange(next); next^.instnumber := number; current^.nextrange := next; end; end; end; end; (* end module cluster.addranges *) (* begin module cluster.createrangelist *) procedure createrangelist (var book: text; var numberlist: rangeptr; apiece:pieceptr); (* generate the list of instruction-piece relationships *) var lastname: name; (* the name before the one we are on *) piececounter: integer; (* a running counter of the piece we are on *) piecenumber: integer; (* the number of the piece we are on *) begin (* initialize the variables *) reset(book); piecenumber := 0; piececounter := 0; (* go through the entire book *) while not eof(book) do begin (* get piece number and name *) getpiece(book, apiece); if not eof(book) then begin (* keep track of the piece number *) if (apiece^.key.hea.keynam.letters <> lastname.letters) then piecenumber := piecenumber + 1; (* insert the new stuff *) addrange(number,piecenumber,numberlist); lastname := apiece^.key.hea.keynam; clearpiece(apiece); (* clear the piece for reuse *) end; end; end; (* end module cluster.createrangelist *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module cluster.overmatched *) function overmatched(currentpair: pairptr; totallength, matchlength: integer):boolean; (* find the maximum # of matches and check to see if it has been exceeded *) var count: integer; (* the number of matches in the bag *) currentbag: bagptr; (* the pointer to the currentbag *) index: numberptr; (* an index pointer to traverse the instlists *) insttotal: integer; (* number of instructions on the pair instlists *) matchesperinst: integer; (* maximum number of amtches per instruction *) maxmatch: integer; (* the maximum number of possible matches *) begin (* initialixe some variables *) count := 0; insttotal := 0; (* go through the instlists and count the number of instructions present *) index := currentpair^.leftseq; while index <> nil do begin insttotal := insttotal + 1; index := index^.nextlist; end; (* find the maximum number of matches *) if matchparameter > 0 then matchesperinst := matchparameter else matchesperinst := totallength div matchlength; maxmatch := insttotal * matchesperinst; (* count the number of elements in the pair bag *) currentbag := currentpair^.seqbag; while currentbag <> nil do with currentbag^ do begin if (nextbag <> nil) then if rightmatch and nextbag^.leftmatch then count := count + 1 ; if not(leftmatch or rightmatch) then count := count + 1; currentbag := currentbag^.nextbag; end; overmatched := count > maxmatch; end; (* end module cluster.overmatched *) (* begin module cluster.emptybag *) procedure emptybag(var baglist: bagptr); (* remove all coordinates from pair^.seqbag *) var trashbag: bagptr; (* the pointer to the bag being thrown away *) begin while baglist <> nil do begin trashbag := baglist; baglist := baglist^.nextbag; trashbag^.nextbag := nil; putbag(trashbag); end; end; (* end module cluster.emptybag *) (* ********************************************************************* *) (* ********************************************************************* *) (* begin module cluster.showRAWbag *) procedure showRAWbag(var fout: text; pairs: pairptr); (* show the linked bag bagview. This shows the data in with a set of coordinate pairs for the corresponding sequences. *) var bagview: bagptr; (* pointer to one of the bag contents *) leftprev: integer; (* storage variable of the previous left coordinate *) rightprev: integer; (* storage variable of the previous right coordinate *) begin (* initialize the variables *) leftprev := 0; rightprev := 0; bagview := pairs^.seqbag; while bagview<>nil do with bagview^ do begin (* write the coordinates and match patterns *) if leftmatch then write(fout,' L ') else write(fout,' - '); write(fout, ' (' ,leftcoord:5, ',' ,rightcoord:5, ') ' ); if rightmatch then write(fout,' R ') else write(fout,' - '); (* write the length of the duplicate sequences *) if leftprev <> 0 then write(fout,' {', leftcoord- leftprev+1:1,'} '); if rightprev <> 0 then write(fout,' {',rightcoord-rightprev+1:1,'} '); writeln(fout); (* advance the variables *) bagview := bagview^.nextbag; leftprev := leftcoord; rightprev := rightcoord; end; end; (* end module cluster.showRAWbag *) (* begin module cluster.multi *) procedure multi(var f: text; c: char; n: integer); (* write out n characters c to file f *) var i: integer; (* index *) begin for i := 1 to n do write(f,c); end; (* end module cluster.multi *) (* begin module cluster.showbag *) procedure showbag(var fout: text; windowsize: integer; pairs: pairptr; apiece: pieceptr); (* show the linked bag bagview in pairs *) const width = 10; (* number of characters per integer *) var abag: bagptr; (* current content of the bag *) alignedbase: integer;(* the aligned base of the piece *) half: integer; (* half of the width constant *) leftprev: integer; (* left coordinate of previous bag *) length: integer; (* the length of the sequence of the piece *) rightprev: integer; (* right coordinate of previous bag *) seqlength: integer; (* the real length of the sequence in the fout *) (* ********************************************************************* *) (* NESTED PROCEDURES *) procedure bar; (* write a bar to the fout of the width of the bag *) begin abag := pairs^.seqbag; while abag <> nil do begin (* one width per coordinate pair *) (* cover the case of a single bag element *) with abag^ do if not(leftmatch or rightmatch) then multi(fout,'*',width); multi(fout,'*',width); abag := abag^.nextbag; end; multi(fout,'*',half); (* a half-width offset for spacing *) write(fout,'*'); (* one more for the | mark after the seq #'s *) end; procedure writemid(var afile: text; number: integer); (* write the number number to file afile, so that the number is in the middle of width characters *) var half: integer; (* the space before and after the number *) numdigits: integer; (* number of characters in the number *) begin if number <> 0 then numdigits := trunc(ln(abs(number))/ln(10))+1 (* log10(n) + 1 *) else numdigits := 1; (* case of n = 0 *) if number < 0 then numdigits := numdigits + 1; (* add a - sign *) half := trunc((width - numdigits) / 2); (* find half the space *) multi(afile,' ',half); (* SPACES BEFORE the number *) write(afile,number:numdigits); (* the NUMBER ITSELF *) multi(afile,' ',half); (* SPACES AFTER the number *) multi(afile,' ',(width - 2*half-numdigits)) (* trailing spaces *) end; (* end NESTED PROCEDURES *) (* ********************************************************************* *) begin (* procedure showbag *) seqlength := 0; leftprev := 0; rightprev := 0; half := width div 2; (* a half-width to use as an offset *) bar; (* makes the bar at the top of every sequence pair *) multi(fout,'*',11); writeln(fout); (* do the top line of sequence coordinates *) multi(fout,' ',namelength); write(fout,'|'); abag := pairs^.seqbag; while abag<>nil do with abag^ do begin if rightmatch then writemid(fout,abag^.leftcoord); if (nextbag <> nil) then if (nextbag^.leftmatch) then begin seqlength := windowsize + (nextbag^.leftcoord - leftcoord); writemid(fout,leftcoord + seqlength - 1); end; (* check to see if it is an unpaired bag element *) (* (not leftmatch) AND (not rightmatch) *) if not (leftmatch or rightmatch) then begin writemid(fout,abag^.leftcoord); seqlength := windowsize; writemid(fout,leftcoord + seqlength - 1); end; abag := abag^.nextbag; end; writeln(fout); (* do the first sequence number and the cluster difference *) dosequence(apiece,book,inst,pairs^.leftseq^.number,alignedbase,length); write(fout,apiece^.key.hea.keynam.letters); write(fout,'|'); (* bar after sequence numbers *) multi(fout,' ',half); leftprev := 0; abag := pairs^.seqbag; while abag<>nil do with abag^ do begin if leftmatch then begin seqlength := windowsize + (leftcoord - leftprev); multi(fout,'-',width); end else if (leftprev <> 0) then writemid(fout,leftcoord - (leftprev + seqlength -1)- 1 ); (* check to see if it is an unpaired bag element *) (* (not leftmatch) AND (not rightmatch) *) if not (leftmatch or rightmatch) then begin multi(fout,'-',width); if (nextbag <> nil) then writemid(fout,nextbag^.leftcoord-(leftcoord+windowsize-1)-1); end; if rightmatch then leftprev := leftcoord; abag := abag^.nextbag end; writeln(fout); (* do the middle *) multi(fout,' ',namelength); write(fout,'|'); (* bar after sequence blanks *) multi(fout,' ',half); leftprev := 0; rightprev := 0; if pairs^.seqbag = nil then write(fout,' WARNING: sequence pairs have too many matches.'); abag := pairs^.seqbag; while abag<>nil do with abag^ do begin if leftmatch then begin seqlength := windowsize + (rightcoord - rightprev); writemid(fout,seqlength) end else if leftprev <> 0 then multi(fout,' ',width); (* check to see if it is a single bag element *) (* (not leftmatch) AND (not rightmatch) *) if not (leftmatch or rightmatch) then begin seqlength := windowsize; writemid(fout,seqlength); end; (* an error checker that shouldn't function unless there is a BIG bug *) if ((leftcoord-leftprev) <> (rightcoord-rightprev)) and leftmatch then begin write(fout,'(!'); write(fout,' ',leftcoord-leftprev+1:width); write(fout,' ',rightcoord-rightprev+1:width); write(fout,'!)'); end; leftprev := leftcoord; rightprev := rightcoord; abag := abag^.nextbag end; writeln(fout); (* do the second sequence number and cluster difference *) dosequence(apiece,book,inst,pairs^.rightseq^.number,alignedbase,length); write(fout,apiece^.key.hea.keynam.letters); write(fout,'|'); (* bar after sequence numbers *) multi(fout,' ',half); rightprev := 0; abag := pairs^.seqbag; while abag<>nil do with abag^ do begin if leftmatch then begin seqlength := windowsize + (rightcoord - rightprev); multi(fout,'-',width) end else if (rightprev <> 0) then writemid(fout,rightcoord - (rightprev + seqlength -1)- 1 ); (* check to see if it is an unpaired bag element *) (* (not leftmatch) AND (not rightmatch) *) if not (leftmatch or rightmatch) then begin multi(fout,'-',width); if (nextbag <> nil) then writemid(fout,nextbag^.rightcoord-(rightcoord+windowsize-1)-1); end; if rightmatch then rightprev := rightcoord; abag := abag^.nextbag end; writeln(fout); (* do the bottom line of sequence coordinates *) multi(fout,' ',namelength); write(fout,'|'); abag := pairs^.seqbag; while abag<>nil do with abag^ do begin if rightmatch then writemid(fout,rightcoord); if (nextbag <> nil) then if (nextbag^.leftmatch) then begin seqlength := windowsize + (nextbag^.rightcoord - rightcoord); writemid(fout,rightcoord + seqlength - 1); end; (* check to see if it is an unpaired bag element *) (* (not leftmatch) AND (not rightmatch) *) if not (leftmatch or rightmatch) then begin writemid(fout,rightcoord); seqlength := windowsize; writemid(fout,rightcoord + seqlength - 1); end; abag := abag^.nextbag; end; writeln(fout); bar; (* write the bar that signifies the end of the entry *) multi(fout,'*',11); writeln(fout); writeln(fout); end; (* end module cluster.showbag *) (* begin module cluster.showlist *) procedure showlist(var fout: text; windowsize, totallength: integer; var rootpair: pairptr; apiece: pieceptr); (* show the linked list pairs using the windowsize from getwindowsize *) var pair: pairptr; (* the current pair being done in the list *) begin pair := rootpair; while pair<>nil do with pair^ do begin if not showfragments and overmatched(pair,totallength,windowsize) then emptybag(pair^.seqbag); if debugging then showRAWbag(fout,pair); showbag(fout,windowsize,pair,apiece); pair := pair^.nextpair end; end; (* end module cluster.showlist *) (* begin module cluster.printsequences *) procedure printsequences(apiece: pieceptr; var book, inst, fout: text; var alignedbase, length, fromparam, toparam: integer; list1, list2: numberptr); (* get sequences from the book that match the instnumbers in list1 and list2 *) var current1: numberptr; (* the pointer to the instructions on list1 *) current2: numberptr; (* the pointer to the instructions on list2 *) begin (* write the sequences corresponding to the instruction number pairs *) current1 := list1; current2 := list2; while current1 <> nil do begin dosequence(apiece,book,inst,current1^.number,alignedbase,length); writeseqline(fout,apiece,length,alignedbase, fromparam,toparam,current1^.number); dosequence(apiece,book,inst,current2^.number,alignedbase,length); writeseqline(fout,apiece,length,alignedbase, fromparam,toparam,current2^.number); (* the alignedbase plus 28 spaces for the instnumber, piecename, starting coordinate, and the spaces between them *) multi(fout,' ',28 + alignedbase); writeln(fout,'^'); writeln(fout); current1 := current1^.nextlist; current2 := current2^.nextlist; end; end; (* end module cluster.printsequences *) (* begin module cluster.showclumpedlist *) procedure showclumpedlist(var fout: text; var clumproot: clumpptr; windowsize, totallength, fromparam, toparam: integer; apiece: pieceptr); (* show the clumped listing in clumproot to file fout using the windowsize from getwindowsize *) var alignedbase: integer; (* the aligned base of the piece of dosequence *) clump: clumpptr; (* the current position in the clumplist *) clumpseq: numberptr; (* the sequences in a given clump *) clumppair: pairptr; (* the pairs in a given clump *) length: integer; (* the length of the sequences found by dosequence *) begin clump := clumproot; while clump <> nil do with clump^ do begin (* print the beginning dividing marks *) multi(fout,'#',70); writeln(fout); multi(fout,'#',70); writeln(fout); (* print the list of sequences contained in the clump *) writeln(fout,'The related pieces in this clump are: '); clumpseq := piecelist; while clumpseq <> nil do begin write(fout,clumpseq^.number:1); if clumpseq^.nextlist <> nil then write(fout,','); clumpseq := clumpseq^.nextlist; end; writeln(fout); (* print the pairs in the clump *) writeln(fout,'Their pairings are as follows: '); writeln(fout); clumppair := clump^.pairlist; while clumppair<> nil do begin if not showfragments and overmatched(clumppair,totallength,windowsize) then emptybag(clumppair^.seqbag); showbag(fout,windowsize,clumppair,apiece); (* write the sequences corresponding to the instruction number pairs *) printsequences(apiece,book,inst,fout,alignedbase,length,fromparam, toparam,clumppair^.leftseq,clumppair^.rightseq); clumppair := clumppair^.nextpair; end; (* print the ending dividing marks *) writeln(fout); multi(fout,'#',70); writeln(fout); multi(fout,'#',70); writeln(fout); writeln(fout); writeln(fout); (* go to the next clump *) clump := clump^.nextclump; end; end; (* end module cluster.showclumpedlist *) (* begin module cluster.fullinsert *) procedure fullinsert (var pair: pairptr; var clumproot: clumpptr); (* make a new clump, stick it on the list, and place a pair's worth of data in it (two piece numbers on the piecelist and a pair on the pairlist) *) var newclump: clumpptr; (* a pointer to a clump *) newpiece: numberptr; (* a pointer to a sequence *) begin (* make a new clump *) getclump(newclump); (* place the instnumbers on the piecelist *) getnumberlist(newpiece); getnumberlist(newpiece^.nextlist); newpiece^.number := pair^.leftpiece; newpiece^.nextlist^.number := pair^.rightpiece; newclump^.piecelist := newpiece; (* place the pair on the clump *) newclump^.pairlist := pair; (* place the new clump on the clumplist *) newclump^.nextclump := clumproot; clumproot := newclump; end; (* end module cluster.fullinsert *) (* begin module cluster.insertclump *) procedure insertclump (var pair: pairptr; var currentclump: clumpptr; leftfound, rightfound: boolean ); (* insert a sequence number and its pair on an already existing clump *) var newpiece: numberptr; (* the new sequence being introduced *) begin (* if one of the sequences is not on the list, add it on *) if leftfound <> rightfound then begin (* add the piecenumber to the piece list *) getnumberlist(newpiece); if leftfound then newpiece^.number := pair^.rightpiece; if rightfound then newpiece^.number := pair^.leftpiece; newpiece^.nextlist := currentclump^.piecelist; currentclump^.piecelist := newpiece; (* insert the pair onto the clump pairlist *) pair^.nextpair := currentclump^.pairlist; currentclump^.pairlist := pair; end; end; (* end module cluster.insertclump *) (* begin module cluster.clumppairs *) procedure clumppairs (var pair: pairptr; var root: clumpptr); (* take a pair from the pairlist and put it into its spot on the clumplist *) var clumpdone: boolean; (* have we finished checking the clumps? *) currentclump: clumpptr; (* a pointer to a clump *) currentpiece: numberptr; (* a pointer to a sequence *) leftfound: boolean; (* does the pair leftsequence have a match? *) rightfound: boolean; (* does the pair rightsequence have a match? *) piecedone: boolean; (* have we finished checking the sequences? *) begin leftfound := false; rightfound := false; currentclump := root; clumpdone := (root = nil); while not clumpdone do begin leftfound := false; rightfound := false; piecedone := false; currentpiece := currentclump^.piecelist; while not piecedone do with currentpiece^ do begin if pair^.leftpiece = number then leftfound := true; if pair^.rightpiece = number then rightfound := true; piecedone := (nextlist=nil) or (leftfound and rightfound); if not piecedone then currentpiece :=currentpiece^.nextlist; end; insertclump(pair,currentclump,leftfound,rightfound); currentclump := currentclump^.nextclump; clumpdone := (currentclump = nil) or leftfound or rightfound; end; if (currentclump = nil) and not(leftfound or rightfound) then begin fullinsert(pair,root); end; end; (* end module cluster.clumppairs *) (* begin module cluster.removebag *) procedure removebag (var prior, defendent: bagptr); (* Checks to see if the defendent bag is continuous with the surrounding sequence. If so, it will be removed. *) begin (* check for a match between the two bags *) if (defendent^.leftcoord = (prior^.leftcoord + 1)) and (defendent^.rightcoord = (prior^.rightcoord + 1)) then begin prior^.rightmatch := true; defendent^.leftmatch := true; end; (* if a bag is matched on both sides, pop it off the list *) if defendent^.leftmatch and defendent^.rightmatch then begin prior^.nextbag := defendent^.nextbag; defendent^.nextbag := nil; putbag(defendent) end; end; (* end module cluster.removebag *) (* begin module cluster.baginsert *) procedure baginsert (var root: bagptr; var coord1, coord2 : integer); (* inserts a set of coordinates into the baglist of a sequence *) var currentbag: bagptr; (* the coordinates of the current variable *) done: boolean; (* are we done with insertion?? *) lastlastbag: bagptr; (* variable to keep the start of a possible crunch *) lastbag: bagptr; (* a pointer to the previous bag on the list *) newbag: bagptr; (* bag being inserted in the middle of the list *) begin (* put the coordinates in a new bag *) getbag(newbag); with newbag^ do begin leftcoord := coord1; rightcoord := coord2; end; (* go through the list to find the position of the new bag and insert it *) currentbag := root; lastbag := root; lastlastbag := root; done := false; while not done do begin if currentbag = nil then begin if root = nil then begin root := newbag; lastbag := root; end else lastbag^.nextbag := newbag; done := true; end else if (currentbag^.leftcoord < coord1) and (currentbag^.rightcoord < coord2) then begin lastlastbag := lastbag; lastbag := currentbag; currentbag := currentbag^.nextbag; end else begin newbag^.nextbag := currentbag; if (currentbag = root) then begin root := newbag; lastbag := root end else lastbag^.nextbag := newbag; done := true; end; end; (* check the newbag and adjacent bags to see if they can be crunched *) if currentbag <> nil then removebag(newbag,currentbag); if newbag <> root then removebag(lastbag,newbag); if lastbag <> root then removebag(lastlastbag,lastbag); end; (* end module cluster.baginsert *) (* begin module cluster.insertinst *) procedure insertinst(var apair: pairptr; firstinst, secondinst: integer); (* insert two instruction numbers on the lists of apair *) var insert: boolean; (* do we insert an instruction pair yet or not *) done: boolean; (* are we finished trying to place the insts yet? *) lastleft: numberptr; (* the previous spot on the left list *) lastright: numberptr; (* the previous spot on the right list *) left: numberptr; (* the current position on the left inst list *) newlist1: numberptr; (* a new numberlist to stick new values in *) newlist2: numberptr; (* a new numberlist to stick new values in *) right: numberptr; (* the current position on the right inst list *) begin (* initialize the search pointers *) left := apair^.leftseq; right := apair^.rightseq; lastleft := apair^.leftseq; lastright := apair^.rightseq; (* find the position of the pair of instruction numbers and insert them *) done := false; while not done do begin (* variable conditions for inserting a pair of instructions *) insert := (left = nil); if not insert then insert := (firstinst <= left^.number); if insert then begin (* create the two new lists and put the numbers in them *) getnumberlist(newlist1); newlist1^.number := firstinst; getnumberlist(newlist2); newlist2^.number := secondinst; if left = nil (* if we are at the end of the list *) then if apair^.leftseq <> nil then begin (* then stick the newlist on the end of the list *) lastleft ^.nextlist := newlist1; lastright ^.nextlist := newlist2; end else begin (* else start the list if nothing is there yet *) apair^.leftseq := newlist1; apair^.rightseq := newlist2; end else if (firstinst<>left^.number) and (secondinst<>right^.number) (* if the new pair is unique *) then if left <> lastleft then begin (* stick a pair in the middle of the list *) newlist1^.nextlist := lastleft^.nextlist; lastleft^.nextlist := newlist1; newlist2^.nextlist := lastright^.nextlist; lastright^.nextlist := newlist2; end else begin (* stick a pair on the top of the list *) newlist1^.nextlist := apair^.leftseq; apair^.leftseq := newlist1; newlist2^.nextlist := apair^.rightseq; apair^.rightseq := newlist2; end else begin (* throw out the unused numberlists *) putnumberlist(newlist1); putnumberlist(newlist2); end; done := true; end else begin (* advance the pointers *) lastleft := left; lastright := right; left := left^.nextlist; right := right^.nextlist; end; end; end; (* end module cluster.insertinst *) (* begin module cluster.pairinsert *) procedure pairinsert (var root: pairptr; var first, second: entryptr); (* takes a pair of sequences from the subind file and inserts it on the list *) var current: pairptr; (* record of a set of sequence data *) foundone: boolean; (* found a duplicate pair *) begin (* only place the pair on the list if it is two distinct pieces *) if first^.piece <> second^.piece then begin (* find the position of the pair of sequences and insert it *) current := root; foundone := false; while not foundone do begin if current = nil then begin (* stick the new pair on the end of the list *) getpair(current); if root <> nil then current^.nextpair := root; root := current; foundone := true; end else if (current^.leftpiece = first^.piece) and (current^.rightpiece = second^.piece) then foundone := true else current := current^.nextpair end; (* write sequence numbers into pairbag and insert all coordinates *) with current^ do begin (* add the new piecenumbers *) leftpiece := first^.piece; rightpiece := second^.piece; (* stick the instructions on the instlists *) if current^.leftseq = nil then insertinst(current,first^.inst,second^.inst) else if writeclumps then insertinst(current,first^.inst,second^.inst); (* add the coordinates to the bag *) baginsert(seqbag,first^.coordinate,second^.coordinate); end; end; end; (* end module cluster.pairinsert *) (* begin module cluster.completelist *) procedure completelist(piecenumber, pieceinst, instcoord: integer; lastchar: char; var entryroot: entryptr; var pairroot: pairptr); (* checks entries to make sure all possible pairs are made *) var current: entryptr; (* pointer to the current entry in the list *) lastentry: entryptr; (* pointer to the previous entry on the list *) newentry: entryptr; (* pointer to the values being inserted *) begin (* make a new entry and fill it with the new numbers *) getentry(newentry); with newentry^ do begin piece := piecenumber; inst := pieceinst; coordinate := instcoord; nextentry := nil; end; (* empty the entrylist if we are no longer in a subindex match *) if lastchar <> '*' then begin while entryroot <> nil do begin lastentry := entryroot; entryroot := entryroot^.nextentry; putentry(lastentry); end; end; (* make all possible entry pairs *) current := entryroot; while current <> nil do begin lastentry := current; (* send the two entries through pairinsert in a least/greatest lineup *) if lastentry^.piece < newentry^.piece then pairinsert(pairroot,lastentry,newentry) else pairinsert(pairroot,newentry,lastentry); current := current^.nextentry; end; if entryroot <> nil then lastentry^.nextentry := newentry else entryroot := newentry; end; (* end module cluster.completelist *) (* begin module cluster.pairclumps *) procedure pairclumps (var pairroot: pairptr; var clumproot: clumpptr); (* take individual pairs off of the pairlist and clump them *) var pair: pairptr; (* an individual pair in the pairlist *) begin (* move down through the pairlist *) while pairroot <> nil do begin (* pop the top pair off of the list and make sure it's isolated *) pair := pairroot; pairroot := pair^.nextpair; pair^.nextpair := nil; (* send the pair to the clumping routine to stick it on the clumplist *) clumppairs(pair,clumproot); end; end; (* end module cluster.pairclumps *) (* begin module cluster.themain *) procedure themain(var clusterp, subind, inst, book, pairs, clumps: text); (* the main procedure of the program *) var apiece: pieceptr; (* a piece pointer used throughout the program *) clumproot: clumpptr; (* linked list of clumps *) coord: integer; (* coordinate of the duplicate in sequence 1 *) entryroot: entryptr; (* linked list of sequence entries *) fromparam: integer; (* the farthest aligned position back from 0 *) instnumber: integer; (* instruction number in the index *) lastchar: char; (* last character in a line of the subind file *) numberlist: rangeptr; (* the root of the piece range list *) pairroot: pairptr; (* linked list of sequence repeats *) toparam: integer; (* the farthest aligned postion in front of 0 *) totallength: integer; (* the total range of bases *) windowsize: integer; (* the size of the window used by indana *) begin (* themain *) (* announce the program *) writeln(output,'cluster ',version:4:2); (* reset the input files and pointers for printing the sequences *) brinit(book); reset(inst); new(apiece); (* set the initial values for the sequence and pointer variables to zero *) clumproot := nil; coord := 0; freebag := nil; entryroot := nil; freeclump := nil; freeentry := nil; freepair := nil; freesequence := nil; fromparam := 0; lastchar := ' '; numberlist := nil; pairroot := nil; instnumber := 0; toparam := 0; totallength := 0; windowsize := 0; matchparameter := 0; (* read the windowsize from the first line of the subindex *) reset(subind); getwindowsize(subind,windowsize); (* read the book and generate the compacted list of piece ranges *) createrangelist(book,numberlist,apiece); (* read the parameters from clusterp *) (* PARAMETERS ARE READ ONCE AND ARE GLOBAL IN THE PROGRAM. THE NAMES ARE UNIQUE AND CAREFUL ATTENTION SHOULD BE PAID TO MAKE SURE THEY REMAIN SO. AFTER THE NAMES ARE READ, THEY ARE NOT MODIFIED IN ANY WAY. THIS SHOULD BE CAREFULLY WATCHED TO AVOID INADVERTENT CHANGES IN THE PARAMETER CONDITIONS *) readparameters(clusterp); (* place the heading lines on the output files *) if writepairs then filehead(subind,pairs); if writeclumps then filehead(subind,clumps); (* write the parameters to the output files *) writeparameters(output); if writepairs then writeparameters(pairs); if writeclumps then writeparameters(clumps); (* look at the book to find the maximum sequence length from aligned base *) maxminalignment(inst, book, fromparam, toparam); totallength := toparam - fromparam + 1; (* begin reading data from subind and pulling out numbers *) while not eof(subind) do begin {writeln(output,'about to readdata');} (* read the data from the line of fout *) readdata(subind,instnumber,coord,lastchar); (* go ahead and put the entry on the list and include all pairings *) completelist(findranges(instnumber,numberlist),instnumber,coord, lastchar,entryroot,pairroot); end; (* print out the list of sequence names *) if writepairs then begin writeln(output,'* about to print output file pairs'); showlist(pairs,windowsize,totallength,pairroot,apiece); end; (* pluck pairs off the top of pairroot and send them through clumppairs *) if writeclumps then pairclumps(pairroot,clumproot); (* print the clumped list out *) if writeclumps then begin writeln(output,'* about to print output file clumps'); showclumpedlist(clumps,clumproot,windowsize,totallength,fromparam, toparam,apiece); end; end; (* themain *) (* end module cluster.themain *) begin themain(clusterp, subind, inst, book, pairs, clumps); 1: end.