program dbmutate(dbin, dbout, markspots, minst, dbmutatep, output); (* dbmutate: mutate genbank database Tom Schneider NCI/FCRDC Bldg 469. Room 144 P.O. Box B Frederick, MD 21702-1201 (301) 846-5581 (-5532 for messages) permanent email: toms@alum.mit.edu toms@ncifcrf.gov http://www.lecb.ncifcrf.gov/~toms/ National Cancer Institute Laboratory of Experimental and Computational Biology *) label 1; (* end of program *) const (* begin module version *) version = 1.90; (* of dbmutate.p 2000 January 3 1.90: 2000 January 3: Program deprecated. 1.85: 1999 April 2: markspots now works with VERY complex cases! 1.68: 1999 March 8: create minst instructions 1.56: 1999 March 6: prepare for absorbing changes into Delila: semicolon comment at end of line 1.55: 1999 March 4: markspots fully functional 1.43: 1999 Feb 12: invented markspots 1.42: 1999 Feb 11: Bug produced null character at end of sequence in procedure writesequence. 1.41: 1999 Feb 11: Retain the secondary ACCESSIONs in the dbout 1998 oct 19: remove bug that left garbage just after the ORIGIN line origin 1996 October 5 *) updateversion = 1.00; (* defines lowest acceptable current parameter file *) (* end module version *) (* begin module describe.dbmutate *) (* name dbmutate: mutate genbank database synopsis dbmutate(dbin: in, dbout: out, dbmutatep: in, markspots: out, minst: out, output: out) files dbin: GenBank flat file format sequences. dbout: GenBank flat file format sequences, with mutations as specified by the parameters. dbmutatep: parameters to control the program. The first line must be the version number of the this program. This allows the program to recognize when the parameter file is old. A series of lines like this: K02402 g20633c This means that entry K02402 is to be grabbed, and the g at 20633 is to be changed to c. The output entry name will be "K02402.g20633c". Multiple changes are allowed, separated by spaces: K02402 t1c g20633c Entries with no changes are allowed, they are just copied to dbout: K02402 To make a deletion, give the endpoints of the deletion range: M55114 d449,450 Both of the end points will be deleted. The numbers can be the same, to delete one base. To make an insertion or change, give the endpoints between which to REPLACE with a new string: M55114 i449,450tt The numbers cannot be the same. Use zero (0) to insert before the start of the sequence and a value larger than the sequence length to insert after the end of the sequence. Any number of spaces may be between the parts of each instruction, but the instructions must be one per line. Blank lines are skipped. Lines that begin with '*' are comments and are skipped. The parts of an instruction can be separated by spaces or by periods. The use of periods makes the notation consistent with the name given to the pieces generated. A semicolon indicates that the rest of the line is a comment. Program parameters can be adjusted by lines that begin with '@'. The form is '@ commandname value'. The adjustable parameters are: fromrange: the distance before the first base mutated to get in the delila instructions (see minst). torange: the distance after the first base mutated to get in the delila instructions (see minst). markspots: The locations to put marks for use by the lister program in the file marks. They are of the form: U 1055 0.0 1055 -20.0 0 (g->a) change where 5391 is the first coordinate given for a mutation These can be concatinated with a file like marks.arrow to define the locations of mutations: cat marks.arrow markspots > marks The markspots are generated on the assumption that the user will want to display alternating pairs consisting of wild type sequence followed by mutant. The 'p' marks command, as defined in the lister program, is used to jump to the next piece. To use this mechanism start the dbmutatep with the GenBank entry for wild type sequences. Follow this by the mutations of that entry. (The program cannot handle more than one entry properly at this time.) For your delila instructions, write pairs of wild type sequence followed by mutant sequence. minst: delila instructions (inst) for grabbing the regions around the mutations. The from-to range will default to preset values (see technical notes) or can be adjusted with an "@" command in dbmutatep. output: messages to the user description Make mutation of GenBank sequences easy. Note that the copy function makes this program supercede dbpull (although this program is probably going to be much slower). Note that the insert function is fully capable of not only doing insertions but also changes and deletions. Beware that the numbering will be messed up with deletions; multiple deletions could be conflicting. THIS PROGRAM IS NOW DEPRECATED because Delila itself can make mutations. This program is still useful, however, for people not using the Delila system (shame on you) who wish to modify a GenBank entry. examples 1.70 version of dbmutate that this parameter file is designed for. * Lines that begin with '*' are a comment. * Substitute the second 10 bases of an entry K02402 i10,21ggggcccccc * Inserts before base 0 are considered to be at base 0, and ones after the * end of the sequence are at the end of the sequence. Here is an insert * from -20 to 1 of 10 bases followed by a deletion later: K02402 i-20,1ggggcccccc d61,70 * This kind of double insert and deletion of the same length is useful for * checking inserts and deletions by using the Unix diff program, because * only one line changes. * This one deletes the first 10 bases and makes a compensating insert: K02402 d-5,10.i20,21ggggcccccc * note that the instruction parts are separated by a period above. * Replace exactly 10 bases: K02402 i10,21cccgggcccc * Delete exactly 10 bases: K02402 i10,21 * set the from-to range: @ fromrange -25 @ torange +5 documentation see also {Parameter file:} dbmutatep {Related Programs:} delila.p, dbbk.p, dbclean.p, dbpull.p, marks.arrow author Thomas Dana Schneider bugs * changesetmax should not be needed; replace by linked list technical notes Constant changesetmax is the largest number of changes allowed per entry. Constant sequencemax is the largest length sequence that can be handled. Because the program creates a new accesssion name, it will strip away any secondary accession names. default values for the from-to range are in constants deffromrange and deftorange. *) (* end module describe.dbmutate *) (* begin module const.dbmutate *) sequencemax = 500000; (* maximum sequence that can be handled (bp) *) deffromrange = -50; (* default from range *) deftorange = +50; (* default to range *) (* end module const.dbmutate *) (* begin module book.const ***************************************************) (* constants needed for book manipulations *) dnamax = 10000000; (* length of dna arrays *) namelength = 100; (* maximum key name length *) linelength = 80; (* maximum line readable in book *) (* end module book.const version = 1.01; (@ of testtime.p 1997 Jan 11 *) (* begin module changeset.const *) changesetmax = 20; (* maximum number of changes allowed *) insertmax = 100; (* maximum insertion length allowed (bp) *) (* end module changeset.const *) (* begin module marspot.const *) insertupperbits = -0.1; (* upperbits for insertion symbol *) insertlowerbits = -1.3; (* upperbits for insertion symbol *) deleteupperbits = -0.1; (* upperbits for deletion symbol *) deletelowerbits = -1.3; (* upperbits for deletion symbol *) changeupperbits = -1.3; (* upperbits for change symbol *) changelowerbits = -11.3; (* upperbits for change symbol *) displacement = 0; (* amount to displace the mark backwards *) (* end module marspot.const *) (* begin module interact.const *) maxstring = 150; (* the maximum string *) (* end module interact.const version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module filler.const *) fillermax = 50; (* the size of the filler array for a string *) (* end module filler.const version = 4.21; (@ of prgmod.p 1997 October 22 *) type (* begin module interact.type *) string = record (* a string of characters *) letters: array[1..maxstring] of char; (* the letters in the string *) length: integer; (* the number of characters in the string *) current: integer; (* the letter we are working on *) end; (* end module interact.type version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module filler.type *) (* the following is an array used to fill a string. it is convenient to have it much shorter than the maxstring, so that it is easy to fill the string using procedure fillstring. the user must declare the value of constant fillermax. *) filler = packed array[1..fillermax] of char; (* end module filler.type version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module trigger.type *) trigger = record (* an object to be searched for *) seek: string; (* the characters looked for *) state: integer; (* how close to triggering we are *) skip: boolean; (* trigger not found- skip the line *) found: boolean (* the trigger was found *) end; (* end module trigger.type version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module changeset.type *) changedata = record changetype: char; (* the type of change: c(hange), i(nsertion), d(eletion) *) baseold: char; (* the old base given in a change instruction *) basenew: char; (* the new base given in a change instruction *) basecoo1: real; (* the first coordinate *) basecoo2: real; (* the second coordinate *) inserts: integer; (* number of bases to insert *) insert: array[1..insertmax] of char; (* bases to insert *) end; changeset = record (* the complete set of changes for an entry *) data: array[1..changesetmax] of changedata; number: integer; (* number of changes *) end; (* end module changeset.type *) (* begin module cystem.type *) {zzzccc} (* define a coordinate system "cystem" *) cystemptr = ^cystem; cystem = record (* a coordinate system segment *) lower: real; (* lower coordinate bound *) upper: real; (* upper coordinate bound *) next: cystemptr; (* the next segment *) end; (* end module cystem.type *) { pieceptr = integer; (* a dummy type to keep pietoint happy *) } (* begin module book.type ****************) (* types needed for book manipulations *) chset = set of 'a'..'z'; (* types defined in book definition *) alpha = packed array[1..namelength] of char; (* this is not alfa *) (* name is a left justified string with blanks following the characters *) name = record letters: alpha; length: 0..namelength (* zero means an unspecified structure *) end; lineptr = ^line; line = record (* a line of characters *) letters: packed array [1..linelength] of char; length: 0..linelength; next: lineptr end; direction = (plus, minus, dircomplement, dirhomologous); configuration = (linear, circular); state = (on, off); header = record (* header of key *) keynam: name; (* key name of structure *) fulnam: lineptr; (* full name of structure *) note: lineptr (* note key *) end; (* base types *) base = (a,c,g,t); dnaptr = ^dnastring; dnarange = 0..dnamax; seq = packed array[1..dnamax] of base; dnastring = record part: seq; length: dnarange; next: dnaptr end; orgkey = record (* organism key *) hea: header; mapunit: lineptr (* genetic map units *) end; chrkey = record (* chromosome key *) hea: header; mapbeg: real; (* number of genetic map beginning *) mapend: real (* number of genetic map ending *) end; pieceptr = ^piece; piekey = record (* piece key *) hea: header; mapbeg: real; (* genetic map beginning *) coocon: configuration; (* configruation (circular/linear) *) coodir: direction; (* direction (+/-) relative to genetic map *) coobeg: integer; (* beginning nucleotide *) cooend: integer; (* ending nucleotide *) piecon: configuration; (* configruation (circular/linear) *) piedir: direction; (* direction (+/-) relative to coordinates *) piebeg: integer; (* beginning nucleotide *) pieend: integer; (* ending nucleotide *) end; piece = record key: piekey; dna: dnaptr end; reference = record pienam : name; (* name of piece referred to *) mapbeg : real; (* genetic map beginning *) refdir : direction; (* direction relative to coordinates *) refbeg : integer; (* beginning nucleotide *) refend : integer; (* ending nucleotide *) end; genkey = record (* gene key *) hea : header; ref : reference; end; trakey = record (* transcript key *) hea : header; ref : reference; end; markerptr = ^marker; markey = record (* marker key *) hea : header; ref : reference; sta : state; phenotype : lineptr; next : markerptr; end; marker = record key : markey; dna : dnaptr; end; (* end module book.type version = 7.05; {of delmod.p 1999Mar17 tds/gds} *) asequence = record length: integer; (* length of the sequence *) sequence: array[1..sequencemax] of char; (* store the sequence *) end; var dbin, (* file used by this program *) dbout, (* file used by this program *) dbmutatep, (* file used by this program *) markspots, (* file used by this program *) minst: text; (* file used by this program *) cha: char; (* see ichread *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module package.trigger *) (* ************************************************************************ *) (* begin module interact.clearstring *) procedure clearstring(var ribbon: string); (* empty the string *) var index: integer; (* to the ribbon *) begin (* clearstring *) with ribbon do begin for index := 1 to maxstring do letters[index] := ' '; length := 0; current := 0; end end; (* clearstring *) (* end module interact.clearstring version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module interact.writestring *) procedure writestring(var tofile: text; var s: string); (* write the string s to file tofile, no writeln *) var i: integer; (* index to s *) begin (* writestring *) with s do for i := 1 to length do write(tofile, letters[i]) end; (* writestring *) (* end module interact.writestring version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module filler.fillstring *) procedure fillstring(var s: string; a: filler); (* this procedure makes it reasonably easy to fill the string s with characters. one calls the procedure as: *) (* 1 2 3 4 5 *) (* 12345678901234567890123456789012345678901234567890 *) (* fillstring(s, 'this-is-the-string '); the two comments make it easy to line the characters up. also, for this example, it was assumed that the length of filler as defined by the constant fillermax was 50. *) var length: integer; (* of the string without trailing blanks *) index: integer; (* of s *) begin (* fillstring *) clearstring(s); length := fillermax; while (length > 1) and (a[length] = ' ') do length := pred(length); if (length = 1) and (a[length] = ' ') then begin writeln(output, 'fillstring: the string is empty'); halt end; for index := 1 to length do s.letters[index] := a[index]; s.length := length; s.current := 1 end; (* fillstring *) (* end module filler.fillstring version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module filler.filltrigger *) procedure filltrigger(var t: trigger; a: filler); (* fill the trigger t *) begin (* filltrigger *) fillstring(t.seek,a) end; (* fillstring *) (* end module filler.filltrigger version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module trigger.proc *) (* this module allows one to scan a series of characters, as from an array or a file, and to "trigger" or detect a simple string in the series. the advantage of the trigger is that several triggers can "observe" a stream of characters at once, each looking for a different thing. some other modules required: interact.const, interact.type *) procedure resettrigger(var t: trigger); (* reset the trigger to ground state *) begin (* resettrigger *) with t do begin state := 0; skip := false; found := false end end; (* resettrigger *) procedure testfortrigger(ch: char; var t: trigger); (* look at the character ch. if it is part of the trigger (at the current trigger state), then the trigger state goes higher. if it is not part of the trigger then the trigger state is reset, skip is true and one should skip onward to find the trigger. if the trigger is found, found is true. 1996 Sep 12: Bug found! In the case of a trigger "ab", the program used to miss it for situations like "aab". This was because at the first a it would step up. Then it would see the second a and recognize that was not part of ab. It would fail to realize that it could be the start of a new one. The code now accounts for that possibility. *) begin (* testfortrigger *) with t do begin state := succ(state); { writestring(list,seek); writeln(list,'testfortrigger seek.letters[',state:1,']:', seek.letters[state],' ch:',ch); } if seek.letters[state] = ch then begin skip := false; if state = seek.length then found := true else found := false end else begin (* it failed. But wait! It could be the beginning of a NEW trigger string! *) if seek.letters[1] = ch then begin state := 1; skip := false; found := false end else begin (* reset trigger *) state := 0; skip := true; found := false end end end end; (* testfortrigger *) (* end module trigger.proc version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module skipblanks *) procedure skipblanks(var thefile: text); (* skip over blanks until a non-blank, or end of line, is found *) begin while (thefile^ = ' ') and not eoln(thefile) do get(thefile); end; procedure skipnonblanks(var thefile: text); (* skip over nonblanks until a blank, or end of line, is found *) begin while (thefile^ <> ' ') and not eoln(thefile) do get(thefile); end; procedure skipcolumn(var thefile: text); (* skip over a data column *) begin skipblanks(thefile); skipnonblanks(thefile) end; (* end module skipblanks version = 4.21; (@ of prgmod.p 1997 October 22 *) (* ************************************************************************ *) (* end module package.trigger version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module equalstring *) function equalstring(a, b: string): boolean; (* test for equality between two strings at current positions *) var index: integer; (* index to both strings *) equal: boolean; (* are letters in a and b the same? *) begin (* equalstring *) if a.length = b.length then begin index := 1; repeat equal := (a.letters[index] = b.letters[index]); index := succ(index) until (not equal) or (index > a.length); equalstring := equal end else equalstring := false end; (* equalstring *) (* end module equalstring version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module onetoken *) procedure onetoken(var afile: text; var buffer: string; var gotten: boolean); (* get a string from a file not using string calls. this lets one obtain lines from a file without interactive prompts *) var index: integer; (* of buffer *) done: boolean; begin (* getstring *) skipblanks(afile); clearstring(buffer); done := false; if eof(afile) then gotten := false else begin index := 0; while (not eoln(afile)) and (index < maxstring) and not done do begin index := succ(index); read(afile, buffer.letters[index]); if buffer.letters[index] = ' ' then begin done := true; index := pred(index); end end; buffer.length := index; buffer.current := 1; gotten := true end end; (* end module onetoken version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module copyaline *) procedure copyaline(var fin, fout: text); (* copy a line from file fin to file fout *) begin (* copyaline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); writeln(fout); end; (* copyaline *) (* end module copyaline version = 4.21; (@ of prgmod.p 1997 October 22 *) (* begin module copyline *) procedure copyline(var fin, fout: text); (* copy a line from file fin to file fout but DO NOT CARRIAGE RETURN on the fout. Carriage return on the fin. *) begin (* copyline *) while not eoln(fin) do begin fout^ := fin^; put(fout); get(fin) end; readln(fin); end; (* copyline *) (* end module copyline version = 4.21; (@ of prgmod.p 1997 October 22 *) (* ************************************************************************** *) (* ************************************************************************** *) (* ************************************************************************** *) {zzzccc} (* begin module cystem.information *) (* Cystem stands for "coordinate system". It is a coordinate system for DNA, RNA or protein sequences that have undergone deletions or insertions. The original numbering is preserved as much as possible. For example, a continuous sequence from 1 to 100 would be noted as: (1 100) If bases 51 to 59 are deleted, the sequence has two parts: (1 50)(60 100) *) (* end module cystem.information *) (* begin module cystem.functions *) procedure clearcystem(var l: cystemptr; freecystem: cystemptr); (* clear the cystem onto the free list *) var lptr: cystemptr; begin if l<>nil then begin lptr:=l; l^.lower := 0; l^.upper := 0; l:=l^.next; lptr^.next:=freecystem; freecystem:=lptr end end; procedure disposecystem(var l: cystemptr); (* dispose of the memory used by l *) var hold: cystemptr; (* previous pointer *) begin while l<>nil do begin hold := l^.next; dispose(l); l:=hold; end end; procedure getcystem(var l: cystemptr; freecystem: cystemptr); (* get a cystem from a free list or create one new *) begin if freecystem<>nil then begin l:=freecystem; freecystem:=freecystem^.next end else new(l); l^.lower:=0; l^.upper:=0; l^.next:=nil end; procedure startcystem(var c: cystemptr; l,u: integer; var freecystem: cystemptr); (* start the cystem c with l and u bounds *) begin getcystem(c,freecystem); with c^ do begin lower := l; upper := u; end; end; procedure showcystem(var f: text; r: real); (* show real number r to file f with decimals if they are not zero *) begin if r - trunc(r) > 0 then write(f,r:1:1) else write(f,round(r):1); end; procedure writecystem(var f: text; c: cystemptr); (* write the entire cystem c to file f *) var p: cystemptr; (* pointer to c *) begin p := c; while p <> nil do begin write(f,'('); if p^.lower = p^.upper then showcystem(f, p^.lower) else begin showcystem(f, p^.lower); write(f,' '); showcystem(f, p^.upper); end; write(f,')'); { write(f,'(', p^.lower:1:1, ' ', p^.upper:1:1, ')'); } p := p^.next; if p = c then begin writeln(output, 'circular'); halt end; end; end; function insidecystem(l: real; c: cystemptr): boolean; (* is l inside the cystem element c? *) begin insidecystem := (c^.lower <= l) and (l <= c^.upper) end; function locatecystem(var c: cystemptr; l: integer): cystemptr; (* locate position l in coordinate system c *) var done: boolean; (* done with a search *) p: cystemptr; (* pointer to c *) begin { writeln(output,'locating ',l:1:1); } p := c; done := false; while not done do begin if not insidecystem(l,p) then begin p := p^.next; if p^.next = nil then done := true end else done := true; end; if p = nil then begin write(output,'cannot locate coordinate ',l:1:1,' in '); writecystem(output,c); writeln(output); halt end; locatecystem := p end; procedure findcystem(var c: cystemptr; l: integer; var k: char; var location: cystemptr); (* see if c is in l, tell what happend in k: k = 'f': found, location pointer is valid k = 'b': l is before the cystem, location pointer is nil k = 'a': l is after the cystem, location pointer is at end of c k = 'm': l is in the middle but missing from the coordinate system. the cystem, location pointer nil *) var done: boolean; (* done with a search *) p: cystemptr; (* pointer to c *) begin k := '?'; if l < c^.lower then begin k := 'b'; location := nil; end else begin {zzzccc} k := 'f'; (* be optimistic *) location := c; done := false; while not done do begin if not insidecystem(l,location) then begin if location^.next = nil then begin if l > location^.upper then k := 'a' (* oh well!! *) else begin k := 'm'; location := nil end; done := true; end else location := location^.next; end else done := true; end; end; end; procedure deletecystem(var c: cystemptr; l,u: integer; var freecystem: cystemptr); (* delete in the cystem c from l to u inclusive *) (* zzzccc this is not completely written. "!" means incomplete code *) var done: boolean; (* done with a search *) p : cystemptr; (* pointer to c *) plower: cystemptr; (* pointer to c *) pupper: cystemptr; (* pointer to c *) pnewer: cystemptr; (* new segment *) holder: cystemptr; (* segment to remove *) begin write(output,'deleting from '); showcystem(output,l); write(output,' to '); showcystem(output,u); write(output,': '); (* find the location of the lower bound of the deletion *) plower := locatecystem(c, l); pupper := locatecystem(c, u); if (plower = pupper) then begin if (plower^.lower = l) and (pupper^.upper = u) then begin write(output,' = =!'); (* find who points to this segment and then remove the segment *) if plower = c then begin (* it is the first segment *) holder := c; c := c^.next; clearcystem(holder,freecystem); end else begin (* search *) p := c; while p^.next^.lower <> l do begin p := p^.next; end; holder := p^.next; p^.next := p^.next^.next; clearcystem(holder,freecystem); end end else if (plower^.lower <> l) and (pupper^.upper = u) then begin write(output,'<> = '); pupper^.upper := l-1; end else if (plower^.lower = l) and (pupper^.upper <> u) then begin write(output,' =<>!'); end else if (plower^.lower <> l) and (pupper^.upper <> u) then begin write(output,'<><> '); (* split one segment in two *) getcystem(pnewer,freecystem); pnewer^.next := pupper^.next; plower^.next := pnewer; pnewer^.lower := u+1; pnewer^.upper := pupper^.upper; plower^.upper := l-1 end { if plower^.lower = l then begin plower^.lower := u end else if pupper^.upper = u then begin pupper^.upper := l end else if plower^.lower = pupper^.upper then begin halt; end else begin end } end else begin write(output,' '); (* remove cystems in between *) p := plower^.next; while p <> pupper do begin holder := p; p := p^.next; clearcystem(holder,freecystem); end; plower^.upper := l-1; pupper^.lower := u+1 end end; procedure checkcystem(c: cystemptr; spot: integer); (* check the coordinate given *) var k: char; (* location information according to findcystem *) location: cystemptr; (* location information according to findcystem *) begin findcystem(c, spot, k, location); write(output,'@ ',spot:1,' k=',k,' '); {zzzccc} { procedure findcystem(var c: cystemptr; l: integer; k: char; var location: cystemptr); } writecystem(output,location); writeln(output); end; (* end module cystem.functions *) (* begin module cystem.test *) procedure testcystem(var f: text); (* test the cystem functions. Note that it is essential to dispose of the cystems afterward to avoid a memory failure. That is, pascal on Sun Sparc workstations does not seem to clear it when this procedure is completed and this can cause trouble for later procedures. *) var freecystem: cystemptr; (* the free list of cystems *) c: cystemptr; (* a cystem *) begin writeln(f,'test of cystem -------------------------------------'); freecystem := nil; c := nil; startcystem(c, 10, 99, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 11, 29, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 51, 59, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 71, 79, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 66, 89, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 41, 50, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 30, 40, freecystem); writecystem(f,c); writeln(f); checkcystem(c, 9); checkcystem(c, 10); checkcystem(c, 11); checkcystem(c, 100); checkcystem(c, 65); checkcystem(c, 66); disposecystem(c); disposecystem(freecystem); {zzzccc} { deletecystem(c, 1, 10, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 30, 34, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 34, 41, freecystem); writecystem(f,c); writeln(f); deletecystem(c, 41, 41, freecystem); writecystem(f,c); writeln(f); } writeln(f,'test of cystem DONE --------------------------------'); {zzzccc} { halt; } end; (* end module cystem.test *) {zzzccc} (* ************************************************************************** *) (* ************************************************************************** *) (* ************************************************************************** *) (* Dummy routines. These simulate the delila environment in which one must convert to internal coordinates. *) function pietoint(p: integer; pie: pieceptr): integer; (* piece to internal coordinates *) var dummy: pieceptr; (* dummy to satisfy useage of pie *) begin dummy := pie; pietoint := p end; function inttopie(i: integer; pie: pieceptr):integer; (* internal coordinates to piece *) var dummy: pieceptr; (* dummy to satisfy useage of pie *) begin dummy := pie; inttopie := i end; function piecelength(pie: pieceptr):integer; (* piece length *) begin piecelength := 0 end; (* begin module dbmutate.readchangeset *) procedure nextnonblank(var f: text); (* locate the next non blank in the file or die gracefully in the attempt *) var done: boolean; (* found the non-blank *) begin done := false; while not done do begin skipblanks(f); (* now we are either end of line or at a character *) if eoln(f) then readln(f) (* oh well, continue *) else done := true; (* should be a character there *) if eof(f) then begin writeln(output,'unexpected end of file found in dbmutatep'); writeln(output,'you are missing part of an instruction'); halt; end; end; end; procedure readchanges(var f: text; var c: changeset); (* Read the base changes from f in the form 'g2343c'. *) begin with c.data[c.number] do begin changetype := 'c'; nextnonblank(f); read(f, baseold); nextnonblank(f); read(f, basecoo1); basecoo2 := basecoo1; if eoln(f) then begin writeln(output,': to make a change, a new base is required'); halt end; nextnonblank(f); read(f, basenew); if not (baseold in ['a','c','g','t']) then begin writeln(output); writeln(output,'WARNING: old base usually should be a, c, g, t'); end; if not (basenew in ['a','c','g','t']) then begin writeln(output); writeln(output,' WARNING: new base usually should be a, c, g, t'); end; inserts := 0; end; end; procedure readinsertion(var f: text; var c: changeset); (* Read the insertion from f in the form 'i449,450tt'. *) var comma: char; (* the comma character *) begin with c.data[c.number] do begin read(f, changetype); read(f, basecoo1); skipblanks(f); read(f, comma); if comma <> ',' then begin writeln(output,' comma expected between coordinates for insertion'); halt end; read(f, basecoo2); if basecoo1 < 0 then basecoo1 := 0; if basecoo2 < 1 then basecoo2 := 1; inserts := 0; while f^ in ['a','c','g','t'] do begin inserts := inserts + 1; if inserts > insertmax then begin writeln(output,' no more than ',insertmax:1, ' insertion bases allowed, increase constant insertmax'); halt end; read(f, insert[inserts]); end; (* The following check could be done in dbmutate, but it is not valid when working with a full coordinate system where it is possible for the fragment numbering to decrease. The check must be done elsewhere. if basecoo1 >= basecoo2 then begin writeln(output,' the first base, ',round(basecoo1):1, ', must be less than', ' the second base, ',round(basecoo2):1, ' for insertion'); halt end; *) end end; procedure readdeletion(var f: text; var c: changeset); (* Read the deletion from f in the form 'M55114 d449,450'. *) var comma: char; (* the comma character *) begin with c.data[c.number] do begin read(f, changetype); read(f, basecoo1); skipblanks(f); read(f, comma); if comma <> ',' then begin writeln(output,'comma expected between coordinates for deletion'); halt end; read(f, basecoo2); if basecoo1 < 1 then basecoo1 := 1; if basecoo2 < 1 then basecoo2 := 1; inserts := 0; if basecoo1 > basecoo2 then begin writeln(output,' the first base, ',round(basecoo1):1, ', must be less than', 'or equal to the second base, ',round(basecoo2):1, ' for deletion'); halt end; end end; procedure readchangeset(var changes: text; var c: changeset); (* read in and change the sequence *) var done: boolean; (* we found a semicolon so we are done *) begin c.number := 0; done := false; with c do while (not eoln(changes)) and (not done) do begin skipblanks(changes); if changes^ =';' then done := true else if changes^ ='.' then begin (* accept a period as part of the set, treat as blank *) get(changes) (* just move past it *) end else begin number := succ(number); if number > changesetmax then begin write(output,'Too many changes requested, increase changesetmax.'); halt end; if changes^ in ['a','c','g','t'] then readchanges(changes, c) else if changes^ = 'i' then readinsertion(changes, c) else if changes^ = 'd' then readdeletion(changes, c) else begin writeln(output,' change must be identified by one of:', ' acgtdi'); writeln(output,' the illegal change character "',changes^, '" was found'); halt end; end; skipblanks(changes); (* prepare for the next one *) end; end; procedure writechangeset(var f: text; changes: changeset); (* write the changeset to file f in shorthand notation *) var i: integer; (* index to insertion *) n: integer; (* index to changes *) begin with changes do for n := 1 to number do begin if n > 1 then write(f,'.'); with data[n] do case changetype of 'c': write(f,baseold,round(basecoo1):1,basenew); 'i': begin write(f,'i',round(basecoo1):1,',',round(basecoo2):1); for i := 1 to inserts do write(f,insert[i]); end; 'd': write(f,'d',round(basecoo1):1,',',round(basecoo2):1) end; end; end; procedure describechangeset(var f: text; changes: changeset); (* describe in English the changeset to file f *) var i: integer; (* index to insertion *) n: integer; (* index to changes *) begin if changes.number = 0 then write(f,'no changes') else begin with changes do for n := 1 to number do begin if n > 1 then write(f,', '); with data[n] do case changetype of 'c': write(f,'at ',round(basecoo1):1,' ',baseold,'->',basenew); 'i': begin write(f,'insert '); for i := 1 to inserts do write(f,insert[i]); write(f,' between ',round(basecoo1):1,' and ',round(basecoo2):1); end; 'd': write(f,'delete ',round(basecoo1):1,' to ',round(basecoo2):1) end; end; end; end; (* end module dbmutate.readchangeset *) (* begin module marksautomate *) procedure marksautomate(var markspots: text); (* define the components necessary for markspots *) begin writeln(markspots,'* markspots: define markings for the lister program'); writeln(markspots,'* The standard marks.arrow must be used prior to this file.'); writeln(markspots); writeln(markspots,'u'); writeln(markspots,'/setmarkspotarrow{'); writeln(markspots,'/bodycolor {black} def'); writeln(markspots,'/strokecolor {black} def'); writeln(markspots,'/BodyThick 0.30 fs def'); writeln(markspots,'/HeadWidth 0.90 fs def'); writeln(markspots,'/HeadLength 1.50 fs def'); writeln(markspots,'} def'); writeln(markspots,'setmarkspotarrow'); writeln(markspots); writeln(markspots,'/change {% tailx taily headx heady shift change'); writeln(markspots,'% the head of an arrow'); writeln(markspots,'pop'); writeln(markspots,'setmarkspotarrow'); writeln(markspots,'fixedarrow'); writeln(markspots,'} def'); writeln(markspots); writeln(markspots,'/changeworra{% tailx taily headx heady shift changeworra'); writeln(markspots,'% the tail of an arrow is a ''worra'' (spelling backards)'); writeln(markspots,'pop'); writeln(markspots,'setmarkspotarrow'); writeln(markspots,'fixedworra'); writeln(markspots,'} def'); writeln(markspots); writeln(markspots,'/insertion{% tailx taily headx heady shift insertion'); writeln(markspots,'% an insertion is a green rectangle'); writeln(markspots,'pop'); writeln(markspots,'/bodycolor {lightgreen} def'); writeln(markspots,'boundrectangle'); writeln(markspots,'} def'); writeln(markspots); writeln(markspots,'/deletion {% tailx taily headx heady shift deletion'); writeln(markspots,'% a deletion is a red rectangle'); writeln(markspots,'pop'); writeln(markspots,'/bodycolor {lightred} def'); writeln(markspots,'boundrectangle'); writeln(markspots,'} def'); writeln(markspots); {zzzyyy} writeln(markspots,'/doubleY{% tailx taily headx heady shift doubleY'); writeln(markspots,'% Two Y''s connected at their bases indicate insertion'); writeln(markspots,'pop'); writeln(markspots,'/bodycolor {lightgreen} def'); writeln(markspots,'fixeddoubleY'); writeln(markspots,'% NOT IMPLEMENTED'); writeln(markspots,'} def'); writeln(markspots); writeln(markspots,'!'); writeln(markspots); end; (* end module marksautomate *) (* begin module sortchanges *) procedure sortchanges(unsorted: changeset; var sorted: changeset); (* sort the changeset unsorted and put the result into sorted. Since marks need to be in increasing position order (as currently defined in lister) it is nice to sort the changes for each piece. *) type position = integer; (* make quicksort be happy but still standard *) function lessthan(a, b: integer): boolean; (* see quicksort *) begin lessthan:=(sorted.data[a].basecoo1 < sorted.data[b].basecoo1) end; procedure swap(a,b: integer); (* see quicksort *) var hold: changedata; begin hold:=sorted.data[a]; sorted.data[a]:=sorted.data[b]; sorted.data[b]:=hold (*;write(output,'a=',a:1,', b=',b:1); print (@ for testing *) end; (* begin module quicksort *) procedure quicksort(left, right: position); (* quick sort a list between positions left and right, into ascending order. a position is simply a scalar of the form 0..max. the array to be sorted is dimensioned 1..max. (the difference in the ranges is important to the correct operation of the sort...) two external routines are used: function lessthan(a, b: position): boolean is a generalized test for value-at-a < value-at-b. procedure swap(a, b: position) switches the items at positions a and b. since these routines are external, the procedure is general. this procedure taken from the book 'algorithms + data structures = programs' by niklaus wirth, prentice-hall, inc., englewood cliffs, n.j.(1976), pp. 76-82 *) var lower, upper: position; (* the positions looked at currently *) center: position; (* the rough center of the region being sorted *) begin lower := left; center := (left + right) div 2; upper := right; repeat while lessthan(lower, center) do lower := succ(lower); while lessthan(center, upper) do upper := pred(upper); if lower <= upper then begin (* keep track of the center through the map: *) if lower = center then center:=upper else if upper = center then center:=lower; swap(lower, upper); lower := succ(lower); upper := pred(upper) end until lower > upper; if left < upper then quicksort(left, upper); if lower < right then quicksort(lower, right) end; (* end module quicksort version = 4.21; (@ of prgmod.p 1997 October 22 *) begin sorted := unsorted; quicksort(1,sorted.number) end; (* end module sortchanges *) (* begin module propagatechanges *) procedure propagatechanges(cin: changeset; var cout: changeset; wildtype: boolean; pie: pieceptr); (* propagate deletion and insertions through the changeset. This simulates making the changes on the sequence and results in a changeset that refers to the ALTERED sequence. Shifting must be done in internal coordinates, so the actual piece must be given. *) var m: integer; (* counter for the changes, current place *) n: integer; (* counter for the changes, later places *) location: integer; (* location of a change *) shift: integer; (* amount to shift a change *) procedure shiftit(var x: changedata); (* shift the change *) begin with x do begin {writeln(output,'from basecoo1 =',round(basecoo1):1);} {writeln(output,'from basecoo2 =',round(basecoo2):1);} {zzz} basecoo1 := inttopie(pietoint(round(basecoo1),pie) +shift,pie); basecoo2 := inttopie(pietoint(round(basecoo2),pie) +shift,pie); {writeln(output,' to basecoo1 =',round(basecoo1):1);} {writeln(output,' to basecoo2 =',round(basecoo2):1);} end; end; (* shiftit *) begin {writeln(output,'NEW propagate --------------------------------');} (* should sorting be done before propagating? No, propagation must be in the order given. *) cout := cin; with cout do for m := 1 to number do begin location := pietoint(round(data[m].basecoo1),pie); {write (output,'changetype = data[',m:1,'] = ',data[m].changetype);} {writeln(output,' location = ',location:1);} with data[m] do case changetype of 'c': ; (* base changes cause no downstream changes *) 'i': begin shift := pietoint(round(basecoo2),pie) -pietoint(round(basecoo1),pie) -inserts - 1; {writeln(output,'insertion shift =',shift:1);} if wildtype then begin for n := m+1 to number do begin with data[n] do if location < pietoint(round(data[n].basecoo1),pie) then begin {writeln(output,'insertion n =',n:1,' of ',changetype);} shiftit(data[n]); end; end; end else begin (* mutant *) shift := -shift; for n := 1 to m-1 do begin with data[n] {do begin} do if location < pietoint(round(data[n].basecoo1),pie) then begin shiftit(data[n]); end; end; end; end; {zzzppp} 'd': begin shift := pietoint(round(basecoo2),pie) -pietoint(round(basecoo1),pie) + 1; {writeln(output,'deletion shift =',shift:1);} if wildtype then begin (* propagation affects all changes downstream *) for n := m+1 to number do begin with data[n] {do begin} do if location <= pietoint(round(data[n].basecoo1),pie) then begin {writeln(output,'delete: n =',n:1,' of ',changetype);} shiftit(data[n]); end; end; end else begin (* mutant *) shift := -shift; for n := 1 to m-1 do begin with data[n] do if location <= pietoint(round(data[n].basecoo1),pie) then begin {writeln(output,'delete: n =',n:1,' of ',changetype);} shiftit(data[n]); end; end; end; end; end; end; end; (* end module propagatechanges *) (* begin module nwpietoint *) function nwpietoint(p: integer; pie: pieceptr): integer; (* no wrap version of pietoint *) (* p is a coordinate on the piece. we want to transform p into a number from 1 to n: an internal coordinate system for easy manipulation of piece coordinates *) var i: integer; (* an intermediate value *) begin with pie^.key do begin case piedir of plus: i := p - piebeg + 1; minus: i := piebeg - p + 1; end; { writeln(output,'newpietoint ---------------'); if piedir = minus then writeln(output,'nwpietoint: pidir is minus') else writeln(output,'nwpietoint: pidir is plus'); writeln(output,'nwpietoint: p: ',p:1); writeln(output,'nwpietoint: pieend: ',pieend:1); writeln(output,'nwpietoint: i: ',i:1); } {zzzbbb} nwpietoint:=i end end; (* end module nwpietoint *) (* begin module writemarks *) procedure writemarks(var markspots: text; changes: changeset; insertupperbits: real; (* upperbits for insertion symbol *) insertlowerbits: real; (* upperbits for insertion symbol *) deleteupperbits: real; (* upperbits for deletion symbol *) deletelowerbits: real; (* upperbits for deletion symbol *) changeupperbits: real; (* upperbits for change symbol *) changelowerbits: real; (* upperbits for change symbol *) displacement: real; (* number of bases to displace the mark backwards *) pie: pieceptr; (* piece for these changes *) thenumber: integer); (* the piece number *) (* Write the marks to file markspots, at the locations defined. Writemarks works with two sequences, first is the wild-type sequence and second is the mutant sequence. Definition: This routine assumes that all previous actions have placed us onto the wild-type sequence. NOTE: the basecoo values changes get modified locally here but are not altered in the original copy. *) const decbase = 2; (* number of decimal places for bases *) widbase = 6; (* width of places for bases *) decbits = 2; (* number of decimal places for bits *) widbits = 6; (* width of places for bits *) blackbar = 0.05; (* black bar width in bases *) shiftdown = 1; (* shift down the change mark arrow on wt sequence *) var i: integer; (* counter for inserts *) n: integer; (* counter for the changes *) markplace: real; (* the last mark place put into markspots *) protection: real; (* protect against postscript bomb when the position does not change if there are no bases in an insertion (ie it acts as a deletion) by adding a little bit to the second insertion position *) sorted: changeset; (* the changes sorted by the exact positions of the marks, must be done for both wild and mutant sequences *) unsorted: changeset; (* the exact positions of the marks, unsorted *) propagated: changeset; (* changes propagated through the changeset *) begin writeln(markspots); write(markspots,'* piece #',thenumber:1,' '); writechangeset(markspots,changes); writeln(markspots); (* first do the wild type sequence: *) propagatechanges(changes,unsorted,true,pie); (* wild type *) (* adjust the locations of the marks *) with unsorted do for n := 1 to number do with data[n] do begin case changetype of 'c': ; (* wait until the next loop *) 'i': begin protection := blackbar; basecoo1 := basecoo1+0.5-protection; basecoo2 := basecoo2-0.5+protection; end; 'd': begin (* mark deletion as a red bar on the wild type sequence *) basecoo1 := basecoo1-0.5; basecoo2 := basecoo2+0.5; end; end; end; sortchanges(unsorted, sorted); (* print the marks out *) with sorted do for n := 1 to number do with data[n] do begin case changetype of 'c': begin markplace := basecoo1-displacement; basecoo1 := markplace; writeln(markspots,'U', ' ',basecoo1:widbits:decbits, ' ',changeupperbits-shiftdown:widbits:decbits, ' ',basecoo2:widbits:decbits, ' ',changelowerbits-shiftdown:widbits:decbits, ' ',displacement:widbits:decbits, ' (', baseold, '->', basenew, ') changeworra '); end; 'i': begin (* mark insertion location as a black bar on the wild type sequence *) (* handle the special cases of insertion at the ends of the sequence *) if (nwpietoint(trunc(basecoo1),pie) = 0) then begin {writeln(output,'HI THERE 1');} basecoo1 := basecoo1 + 1.0; basecoo2 := basecoo2 + 1.0; displacement := -1.8; (* I don't know why this needs more *) {zzzbbb} end; if (nwpietoint(trunc(basecoo1),pie) > piecelength(pie)) then begin {writeln(output,'HI THERE 2');} basecoo1 := basecoo1 - 1.0; basecoo2 := basecoo2 - 1.0; displacement := 0.0; {zzzbbb} end; writeln(markspots,'U', ' ',basecoo1:widbase:decbase, ' ',deleteupperbits:widbits:decbits, (* upperbits *) ' ',basecoo2:widbase:decbase, ' ',deletelowerbits:widbits:decbits, (* lowerbits *) ' ',displacement:widbits:decbits,' (insert) deletion'); end; 'd': begin (* mark deletion as a red bar on the wild type sequence *) write(markspots,'U', ' ',basecoo1:widbase:decbase, ' ',deleteupperbits:widbits:decbits, (* upperbits *) ' ',basecoo2:widbase:decbase, ' ',deletelowerbits:widbits:decbits, (* lowerbits *) ' ',displacement:widbits:decbits,' (deletion'); writeln(markspots,') deletion '); end; end; end; write(markspots,'p - skip ahead to mutated piece'); writeln(markspots,' #',(thenumber+1):1); (* anticipate its number *) propagatechanges(changes,unsorted,false,pie); (* mutant *) with unsorted do for n := 1 to number do with data[n] do begin case changetype of 'c': begin markplace := basecoo1-displacement; basecoo1 := markplace; basecoo2 := markplace; end; 'i': begin (* mark insertion as a green bar on the mutant sequence. Therefore forceforward before putting the mark, and put the mark on the sequence itself *) if inserts <= 0 then protection := blackbar else protection := 0; markplace := basecoo1+0.5; basecoo1 := markplace-protection; basecoo2 := markplace+inserts+protection; end; 'd': begin markplace := basecoo1; protection := blackbar; basecoo1 := markplace-0.5-protection; basecoo2 := markplace-0.5+protection; end; end; end; sortchanges(unsorted, sorted); with sorted do for n := 1 to number do with data[n] do begin case changetype of 'c': begin writeln(markspots,'U', ' ',basecoo1:widbits:decbits, ' ',changeupperbits:widbits:decbits, ' ',basecoo2:widbits:decbits, ' ',changelowerbits:widbits:decbits, ' ',displacement:widbits:decbits, ' (', baseold, '->', basenew, ') change '); end; 'i': begin (* mark insertion as a green bar on the mutant sequence. Therefore forceforward before putting the mark, and put the mark on the sequence itself *) (* handle the special cases of insertion at the ends of the sequence *) if (nwpietoint(trunc(basecoo1),pie) = 0) then begin {writeln(output,'HI THERE 3');} basecoo1 := basecoo1 + 1.0; basecoo2 := basecoo2 + 1.0; displacement := -1.8; (* I don't know why this needs more *) {zzzbbb} end; if (nwpietoint(trunc(basecoo1),pie) > piecelength(pie)) then begin {writeln(output,'HI THERE 4');} writeln(output,'nwpietoint(trunc(basecoo1),pie) = ',nwpietoint(trunc(basecoo1),pie):1); writeln(output,'nwpietoint(trunc(basecoo2),pie) = ',nwpietoint(trunc(basecoo2),pie):1); writeln(output,'piecelength(pie) = ',piecelength(pie):1); basecoo1 := basecoo1 - 1.0; basecoo2 := basecoo2 - 1.0; displacement := 0.0; {zzzbbb} end; write(markspots,'U', ' ',basecoo1:widbase:decbase, ' ',insertupperbits:widbits:decbits, (* upperbits *) ' ',basecoo2:widbase:decbase, ' ',insertlowerbits:widbits:decbits, (* lowerbits *) ' ',displacement:widbits:decbits,' ('); (* protect against null postscript string. It's not clear that this helps. *) if inserts = 0 then write(markspots,' ') else for i := 1 to inserts do write(markspots,insert[i]); writeln(markspots,') insertion'); end; 'd': begin writeln(markspots,'U', ' ',basecoo1:widbase:decbase, ' ',deleteupperbits:widbits:decbits, (* upperbits *) ' ',basecoo2:widbase:decbase, ' ',deletelowerbits:widbits:decbits, (* lowerbits *) ' ',displacement:widbits:decbits,' (delete) deletion'); end; end; end; writeln(markspots,'p - skip ahead to next piece'); end; (* end module writemarks *) (* begin module dbmutate.processcommand *) procedure processcommand(var dbmutatep: text; var fromrange, torange: integer); (* process commands *) var command: char; (* a command in dbmutatep *) begin read(dbmutatep, command); (* skip the '@' *) skipblanks(dbmutatep); read(dbmutatep, command); skipnonblanks(dbmutatep); write(output,'@ command: '); if command in ['f','t'] then begin case command of 'f': begin read(dbmutatep,fromrange); write(output,'fromrange set to ',fromrange:1); end; 't': begin read(dbmutatep,torange); write(output,'torange set to ',torange:1); end; end; end else begin write(output,'unidentifed command, ",command,", ignored'); end; writeln(output); readln(dbmutatep); end; (* end module dbmutate.processcommand *) (* begin module dbmutate.themain *) procedure themain(var dbin, dbout, dbmutatep, markspots, minst: text); (* the main procedure of the program *) var accession: trigger; (* trigger to find the ACCESSION pattern *) changes: changeset; (* current set of changes *) entryname: trigger; (* a mark for the accession number of an entry *) entryend: trigger; (* a mark for the end of an entry *) gotten: boolean; (* whether the entryname was found *) locus: trigger; (* trigger to find the LOCUS pattern *) origin: trigger; (* trigger to find the ORIGIN pattern *) organism: trigger; (* trigger to find the ORGANISM pattern *) parameterversion: real; (* parameter version number *) piecesoutput: integer; (* count of the number of output pieces *) fromrange, torange: integer; (* range of delila instructions *) procedure readsequence(var f: text; var s: asequence); (* read in the sequence from f *) var c: char; (* character in dbin *) done: boolean; (* end search *) sindex: integer; (* index for the sequence *) begin with s do begin (* read in the sequence *) resettrigger(entryend); done := false; sindex := 0; while not done do begin if eof(f) then begin write(output,'No end to entry '); writestring(output,entryname.seek); writeln(output,'!'); halt end; if eoln(f) then begin readln(f); end else begin read(f,c); testfortrigger(c, entryend); if entryend.found then begin done := true; readln(f); end else begin (* store the character *) { if c in ['a','c','g','t'] then begin } (* allow any alphabetic, a-z *) if ((ord(c) >= ord('a')) and (ord(c) <= ord('z'))) then begin sindex := sindex+1; if sindex > sequencemax then begin writestring(output,entryname.seek); writeln(output,' sequence length is', ' larger than the allowed ',sequencemax:1,'.'); writeln(output,'Increase constant sequencemax.'); halt end; sequence[sindex] := c end { else then writeln(output,'rejected character: ',c); } end end end; length := sindex; end end; procedure writesequence(var f: text; s: asequence); (* write sequence s to file f *) var sindex: integer; (* index for the sequence *) begin with s do begin sindex := 0; (* 1999 Feb 11: the bug was that this was <= instead of < ! *) while sindex < length do begin sindex := sindex + 1; if (sindex mod 60) = 1 then begin write(f,sindex:9); end; if (sindex mod 10) = 1 then begin write(f,' '); end; write(f,sequence[sindex]); { if sindex > 14540 then begin write(output,' length = ',length:1); write(output,' sindex = ',sindex:1); write(output,' ord = ',ord(sequence[sindex])); writeln(output); end; } if (sindex mod 60) = 0 then begin writeln(f); end; end; if (sindex mod 60) <> 1 then begin writeln(f); { writeln(f,'sindex = ', sindex:1); writeln(f,'sindex mod 60 = ', sindex:1); } end; writeln(f,'//'); writeln(f); end end; procedure changesequence(changes: changeset; var thesequence: asequence); (* read in and change the sequence *) var i: integer; (* counter for inserts *) n: integer; (* counter for the changes *) shift: integer; (* amount to shift a portion of the sequence *) begin with thesequence do begin (* alter the sequence *) with changes do for n := 1 to number do with data[n] do begin (* this is not relevant - handle ends if basecoo1 > length then begin writeln(output,'first base coordinate ',basecoo1:1, ' exceeds sequence length ',length:1); halt end; if changetype in ['d','i'] then if basecoo2 > length then begin writeln(output,'second base coordinate ',basecoo2:1, ' exceeds sequence length ',length:1); halt end; *) case changetype of 'c': begin if baseold = basenew then begin writeln(output,'The initial and final bases are the same,'); writeln(output,'so you did not request any change!'); halt end; if baseold <> sequence[round(basecoo1)] then begin writeln(output,'The old base at ',round(basecoo1):1, ' is NOT ',baseold,'!', ' It is ',sequence[round(basecoo1)],'.'); halt end; sequence[round(basecoo1)] := basenew; end; 'i': begin { writeln(output,inserts:1); } if basecoo1 > length then basecoo1 := length; if basecoo2 > length then basecoo2 := length + 1; {writeln(output,'old length ',length:1);} shift := inserts - (round(basecoo2) - round(basecoo1) - 1); { length := length - shift; evidently the wrong direction } length := length + shift; {writeln(output,'shift ',shift:1);} {writeln(output,'new length ',length:1);} if shift > 0 then begin (* insert *) {writeln(output,'ins');} (* shift the rest of the sequence out of the way *) if length > sequencemax then begin writeln(output,' Insertion of ',inserts:1,' bases would', ' cause the sequence to exceed constant', ' sequencemax (',sequencemax:1,')'); halt; end; for i := length downto round(basecoo2) do sequence[i] := sequence[i-shift]; end else if shift < 0 then begin (* delete *) {writeln(output,'del');} for i := round(basecoo2)+shift to length do sequence[i] := sequence[i-shift]; end; (* insert the new material *) for i := 1 to inserts do sequence[round(basecoo1) + i] := insert[i]; { (* overwrite for debugging *) for i := 1 to inserts do sequence[round(basecoo1) + i] := '!'; } { fix //} end; 'd': begin (* delete inclusively from basecoo1 to basecoo2 *) { writeln(output,'DELETE'); sequence[basecoo1] := 'A'; sequence[basecoo2] := 'B'; } if basecoo1 < 0 then basecoo1 := 1; if basecoo2 < 0 then basecoo2 := 1; if basecoo1 > length then basecoo1 := length + 1; if basecoo2 > length then basecoo2 := length + 1; shift := round(basecoo2) - round(basecoo1) + 1; length := length - shift; for i := round(basecoo1) to length do sequence[i] := sequence[i+shift]; end; end; end; end end; procedure startinst(var minst: text; version: real); (* start the instruction file *) begin rewrite(minst); writeln(minst,'title "dbmutate ',version:4:2,'";'); writeln(output,'title "dbmutate ',version:4:2,'";'); end; procedure instorgchr(var minst: text; organismgenus: string; (* organism genus name *) organismspecies: string); (* organism species name *) (* define the organism into the mutation inst file *) procedure givename; (* just the name please with ; at end *) begin write(minst,organismgenus.letters[1]); write(minst,'.'); writestring(minst,organismspecies); writeln(minst,';'); end; begin if changes.number > 0 then begin writeln(minst); write(minst,'organism '); givename; write(minst,'chromosome '); givename; writeln(minst); end; end; procedure instget(var minst: text; piecename: string; changes: changeset; fromrange, torange: integer); (* define the piece and get instructions to file minst. For the given piece (piecename) write the changes for the range from-to. *) procedure sign(var minst:text; range: integer); (* make a + sign on the positive range *) begin if range > 0 then write(minst,'+'); end; procedure dotheget; begin write(minst, 'get from ', changes.data[1].basecoo1:1, ' '); sign(minst,fromrange); write(minst, fromrange:1, ' to same '); sign(minst,torange); writeln(minst, torange:1, ';'); end; begin if changes.number > 0 then begin write(minst,'(* '); describechangeset(minst,changes); writeln(minst,' *)'); writeln(minst,'name "wildtype";'); write(minst,'piece '); writestring(minst,piecename); writeln(minst,'; (* wild type *)'); dotheget; write(minst,'name "'); writestring(minst,piecename); write(minst,' '); describechangeset(minst,changes); writeln(minst,'";'); write(minst,'piece '); writestring(minst,piecename); write(minst,'.'); writechangeset(minst,changes); writeln(minst,'; (* mutant *)'); dotheget; end; end; procedure dolocus(var dbin, dbout, markspots: text; entryname: trigger; changes: changeset; var found: boolean; var fromrange, torange: integer; var thenumber: integer); (* look through a locus and find the current entryname. If it is the right one, process it and return flag found as true. *) var anaccessionname: string; (* accession name of an entry *) alocusname: string; (* accession name of an entry *) c: char; (* character in dbin *) entrylength: integer; (* length of sequence in an entry *) holdfile: text; (* file for holding the middle of an entry *) organismgenus: string; (* organism genus name *) organismspecies: string; (* organism species name *) done: boolean; (* end search *) sequence: asequence; (* a sequence *) sortedchanges: changeset; (* changes sorted for writing out *) pie: pieceptr; (* dummy to satisfy propagatechanges *) dummy: pieceptr; (* dummy to satisfy useage of pie *) begin rewrite(holdfile); (* get the locus name and determine the length *) onetoken(dbin, alocusname, gotten); read(dbin,entrylength); clearstring(organismgenus); clearstring(organismspecies); (* find the accession name *) resettrigger(entryend); done := false; found := false; while not done do begin if eoln(dbin) then begin readln(dbin); writeln(holdfile); if eof(dbin) then done := true; {writeln(output);} end else begin read(dbin,c); write(holdfile,c); {write(output,c);} testfortrigger(c, entryend); if entryend.found then done := true; testfortrigger(c, accession); if accession.found then begin onetoken(dbin, anaccessionname, gotten); if gotten then begin { write(output,' found: '); writestring(output,anaccessionname); writeln(output); } (* There are two paths here: the entry was found - in which case we copy and mutate or it was not found - in which case we skip to end of entry. *) done := true; if equalstring(anaccessionname, entryname.seek) then begin found := true; (* complete hold file *) write(holdfile,' '); writestring(holdfile,entryname.seek); (* Write the changes to the new accession name: *) if changes.number > 0 then write(holdfile, '.'); writechangeset(holdfile, changes); if changes.number > 0 then begin writeln(markspots); writeln(markspots,'* dbmutatep requested piece ',thenumber:1); writeln(markspots,'* final display pieces ', ((2*thenumber)-3):1, ' and ', ((2*thenumber)-2):1); write(markspots,'* '); writestring(markspots,entryname.seek); writeln(markspots); end else begin writeln(markspots); write(markspots,'* The wild type piece in dbmutatep, '); writestring(markspots,entryname.seek); writeln(markspots, ' does not have marks in markspots'); end; pie := nil; dummy := pie; (* keep compiler happy *) writemarks(markspots, sortedchanges, insertupperbits, insertlowerbits, deleteupperbits, deletelowerbits, changeupperbits, changelowerbits, displacement, pie, thenumber); (* THIS KEEPS ALL SECONDARY ACCESSION NAMES:*) write(holdfile,' '); copyaline(dbin,holdfile); (* find the organism name *) resettrigger(organism); done := false; while not done do begin if eof(dbin) then begin write(output,'No '); writestring(output,organism.seek); write(output,' to entry '); writestring(output,entryname.seek); writeln(output,'!'); halt end; if eoln(dbin) then begin readln(dbin); writeln(holdfile); end else begin read(dbin,c); write(holdfile,c); testfortrigger(c, organism); if organism.found then begin done := true; onetoken(dbin, organismgenus, gotten); if gotten then begin onetoken(dbin, organismspecies, gotten); if gotten then begin { write(output,'genus: '); writestring(output,organismgenus); write(output,' species: '); writestring(output,organismspecies); writeln(output); } write(holdfile,' '); writestring(holdfile,organismgenus); write(holdfile,' '); writestring(holdfile,organismspecies); instorgchr(minst, organismgenus, organismspecies); end; end else copyaline(dbin, holdfile); end end end; instget(minst, entryname.seek, changes, fromrange, torange); (* find the origin of the sequence *) resettrigger(origin); done := false; while not done do begin if eof(dbin) then begin write(output,'No '); writestring(output,origin.seek); write(output,' to entry '); writestring(output,entryname.seek); writeln(output,'!'); halt end; if eoln(dbin) then begin readln(dbin); writeln(holdfile); end else begin read(dbin,c); write(holdfile,c); testfortrigger(c, origin); if origin.found then begin done := true; copyaline(dbin, holdfile); end end end; readsequence(dbin, sequence); changesequence(changes, sequence); (* dump the results out *) writestring(dbout,locus.seek); write(dbout,' '); writestring(dbout,alocusname); write(dbout,' ',sequence.length:6); reset(holdfile); while not eof(holdfile) do copyaline(holdfile,dbout); (* write out the sequence *) writesequence(dbout, sequence); end end; end end; resettrigger(entryname); testfortrigger(c, entryname); if entryname.found then begin end; if eof(dbin) then done := true; end; if not found then begin (* it is safest to skip to the end of the entry to avoid incorrect triggerings from the current entry. *) resettrigger(entryend); done := false; while not done do begin if eof(dbin) then begin write(output,'No end to entry '); writestring(output,entryname.seek); writeln(output,'!'); halt end; if eoln(dbin) then begin readln(dbin); end else begin read(dbin,c); testfortrigger(c, entryend); if entryend.found then begin done := true; readln(dbin); end end end end; end; procedure locateentry(var dbin, dbout, markspots: text; entryname: trigger; changes: changeset; var fromrange, torange, thenumber: integer); (* locate the entry in dbin *) var done: boolean; (* end search *) c: char; (* a character in dbin *) begin reset(dbin); resettrigger(locus); done := false; while not done do begin {if eoln(dbin) then writeln(output);} if eof(dbin) then begin write(output,'Could not locate entry '); writestring(output,entryname.seek); writeln(output,'!'); halt end; if eoln(dbin) then readln(dbin) else begin read(dbin,c); {write(output,c);} testfortrigger(c, locus); if locus.found then dolocus(dbin, dbout, markspots, entryname,changes,done,fromrange,torange, thenumber) end; end end; begin writeln(output,'dbmutate ',version:4:2); reset(dbmutatep); readln(dbmutatep, parameterversion); if parameterversion < updateversion then begin writeln(output, 'You have an old parameter file!'); halt end; (* 1 2 3 4 5 *) (* 12345678901234567890123456789012345678901234567890 *) filltrigger( locus, 'LOCUS '); filltrigger( accession, 'ACCESSION '); filltrigger( origin, 'ORIGIN '); filltrigger( organism, 'ORGANISM '); filltrigger( entryend, '// '); rewrite(dbout); rewrite(markspots); writeln(markspots,'* ', 'dbmutate ',version:4:2); marksautomate(markspots); startinst(minst, version); piecesoutput := 0; fromrange := deffromrange; torange := deftorange; while not eof(dbmutatep) do begin skipblanks(dbmutatep); if (dbmutatep^ = '*') or (eoln(dbmutatep)) then readln(dbmutatep) (* skip comments and blank lines *) else begin if dbmutatep^ = '@' then processcommand(dbmutatep, fromrange, torange) else begin onetoken(dbmutatep, entryname.seek, gotten); if gotten then begin piecesoutput := succ(piecesoutput); writestring(output, entryname.seek); readchangeset(dbmutatep, changes); write(output,' '); describechangeset(output, changes); readln(dbmutatep); writeln(output); locateentry(dbin, dbout, markspots, entryname, changes, fromrange, torange, piecesoutput) end; end; end; end; end; (* end module dbmutate.themain *) begin (* testcystem(output); {zzzccc} *) themain(dbin, dbout, dbmutatep, markspots, minst); 1: end.