program dbinst(db, binst, einst, oinst, sinst, olength, slength, dbinstp, locuslist, missing, output); (* dbinst: extract Delila instructions from a GenBank database Dr. Thomas D. Schneider National Institutes of Health National Cancer Institute Center for Cancer Research Nanobiology Program Molecular Information Theory Group Frederick, Maryland 21702-1201 toms@ncifcrf.gov permanent email: toms@alum.mit.edu (use only if first address fails) http://www.ccrnp.ncifcrf.gov/~toms/ *) label 1; (* end of program *) const (* begin module version *) version = 3.50; (* of dbinst.p 2006 Oct 25 2006 Oct 25, 3.50: allow for strain name in organism: ORGANISM Escherichia coli K12 2005 Jan 9, 3.49: cleanup 2005 Jan 9, 3.48: bug in slength for complements, found by Eric Miller. Solution: oldtonumber may come from fromnumber or tonumber depending on which is larger. Also, tofound was not initialized to false! This set oldtonumber to be weird ... 2004 Aug 5, 3.47: fixed a bug introduced in version 3.45 that gives wrong instruction names, by Michael Levashov, see mmm 2004 Jul 23, 3.46: cleanup; genename -> featurename 2004 Jul 23, 3.45: instructions now have the names of the respective gene next to them, by Michael Levashov, see mmm 1998 July 29: instructions have direction added to them so that extractions from circular pieces are done correctly. 1994 August 4: previous changes origin 1989 July 12 *) (* end module version *) (* Bug history 1994 June 10: now uses ACCESSION numbers instead of LOCUS names (by changing the search string!) 1991 July 24: modified to work on the new GenBank format. THIS VERSION WILL RUN ON THE NEW FORMAT. 1991 May 15: The new GenBank format must be used instead of the old format. the program will not work with the new format which was introduced in sep 1991 1990 August 15 object lengths are now calculated as length := tonumber - fromnumber + 1; Earlier versions earlier than 2.34 did not have the '+ 1'. *) (* begin module describe.dbinst *) (* name dbinst: extract Delila instructions from a GenBank database synopsis dbinst(db: in, binst: out, einst: out, oinst: out, sinst: out, olength: out, slength: out, dbinstp: in, locuslist: out, missing: out, output: out) files db: a set of GenBank entries binst: instructions for finding the beginning of a feature einst: instructions for finding the ending of a feature oinst: instructions for finding the whole feature, called the "object". They are given in the form "from begin + f to end + t" where f and t are the "from" and "to" parameters given in dbinstp. sinst: instructions for finding the regions between features, called the "space". They have the same form as those of oinst. olength: list of object lengths slength: list of space lengths dbinstp: parameters to control the program First line: the name of the feature to use. Second line: two integers, the base "from" and the base "to" relative to the alignment point to write the instructions. If "from" is larger than "to" then generic names "before" and "after" are written. This allows one to make a generic file of instructions to be copied and edited later. Third line: The first 4 characters on the line control which instruction files are to be written. To have all 4 on, use 'beos', for begin, end, object and space. Any other character in a position means that the corresponding file will not be written. The file will be rewritten however. Thus beos means write all files, and bEos would not write the einst file. Fourth line: 2 characters without spaces that control which length files are to be written. To both on, use 'os', for object and space. Any other character means that the corresponding file will not be written. The file will be rewritten however. Fifth line: If the first character is 'r' then remove obviously duplicated instructions and object or space lengths. When alternative splicing occurs, GenBank records the endpoint several times, so that the sequence instructions are identical. By using this toggle switch, such cases are eliminated. Sixth line: If the first character is 'f' then the coordinates of the instruction are written whether or not the object is off the end of the sequence. This allows one to pick up objects that are partially on a piece. If the first character is 's' then select against the feature if either end is missing. This makes the length list correspond to the instruction set. Seventh line: Alignment shift. This integer is added to the from and too coordinates of the instructions written out. Normally this should be 0. An example helps. Normally, if the zero of splice donor sites is defined the first base on the intron, then if one is writing instructions based on exon coordinates the zero base will be 1 too low. By making the alignment shift 1, the instructions written out will match the expectations of other programs. Note: object coordinates are shifted accordingly; this may not be quite what you want if you are using them from the olength file! However, the length is not affected. locuslist: a list of all the loci in the db that have features of interest. This list can be used with dbpull to create reduced databases containing only those entries that contain the features we want. missing: Features that are listed under the database COMMENT are listed here. These are "EMBL features not translated to GenBank features". We do not consider these to be reliable. They are NOT included in the binst, einst or olength, slength instructions. output: messages to the user description The GenBank entries in db are scanned, and Delila instructions are generated, according to a desired feature table item. Four kinds of instruction are made: beginning, ending, object and space. Beginning appears only if the data for the beginning of the feature is in the db. Ending appears only if the data for the ending of the feature is in the db. Object appears only if both the beginning and ending are there. Space only appears if there was an ending to the previous feature, and the current feature has a beginning. Thus object and space instructions is guaranteed to be a "natural" length. The names of the pieces are now the ACCESSION number. The names for the instructions are determined as follows. The GenBank ORGANISM contains the two part genus/species name, such as: ORGANISM Homo sapiens The parts are joined into "Homo.sapiens", and this becomes the name of the organism and chromosome in the instructions. The instructions for organism and chromosome only change when the genus/species name changes in db. The LOCUS name of the entry is picked up and used as the piece name. These naming conventions are the ones generated automatically by the dbbk program, so one need not think about it most of the time. In each entry, lines of the form: pept < 1 46 Ig V-R-H region protein, exon x are located and used to generate Delila "get" statements. If a "<" appears before the first number, then no instruction is written to binst, since the beginning point is before the GenBank sequence. If a "<" appears before the second number, then no instruction is written to einst, since the ending point is after the GenBank sequence. If a "<" or ">" appears in the db, then no object instructions or lengths are written. If a ">" appears in the previous feature or ">" appears in the current feature, then no space instructions or lengths are written. So for the above example, only one Delila instruction would be written: get from 46 -10 to 46 +20; if the dbinstp contained -10 20, and get from 46 before to 46 after; if the dbinstp contained 10 -20. where "before" and "after" are replaced by the integers from dbinstp. examples If dbinstp contains: CDS the name of the feature to use. -40 20 "from" and "to" to write the instructions. beOS "beos" means begin, end, object, space instructions written os "os" means object and space length file written r "r" means remove obviously duplicated instructions. F "f" = find-anyway. 's'= select AGAINST feature if either end missing 0 alignment shift: amount to shift the zero base. then instructions to get coding sequence (CDS) starts (binst) and ends (einst) from -40 to +20 will be made. Instructions for the entire coding region, from -40 before the start of the peptide to 20 bases after will not be written because O is capitalized and so not recognized. Instructions for the regions between peptides, from -40 inside each previous peptide to 20 bases into the inside of the next peptide will not be written because S is capitalized and so not recognized. documentation none see also dbbk.p author Thomas Dana Schneider bugs The program does not produce the instructions for space between the first object and the beginning of the sequence or the space after the last object in the sequence. This is possible (and perhaps should be controlled by a parameter) but it would not produce "natural" lengths because those space lengths depend on the length of the reported sequence. It is not clear that spaces are done properly anymore. Possible bug at "SPACE PROBLEM". Genus names are limited to genuslimit (a constant) to avoid names longer than the standard Delila limit. technical notes The expected column locations of the complement flag in the database, (the 'before end of piece' and the 'after end of piece' flags) are given in the program constants. If a file is not written to, this version of the program will not touch the file. Though this could lead to some confusion on the part of an incautious user (who thinks the program wrote a file when it did not), this does mean that the program will not create any new files that are not necessary. *) (* end module describe.dbinst *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module dbinst.const *) (* these are the logcations on GenBank lines where the flags are found *) feaspot = 6; (* first character of feature name *) startspot = 22; (* first character of the location *) comspot = 31; (* complement indication column *) offfromspot = 13; (* indicates from is before piece *) offtospot = 22; (* indicates from is before piece *) fromspot = 19; (* location of from number's 1's place digit *) tospot = 28; (* location of to number's 1's place digit *) genuslimit = 1; (* maximum number of characters in the genus name to generate. 1995 Aug 4 this was changed from 4 to 1. *) (* end module dbinst.const *) (* begin module interact.const *) maxstring = 150; (* the maximum string *) (* end module interact.const version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module filler.const *) fillermax = 50; (* the size of the filler array for a string *) (* end module filler.const version = 'prgmod 4.05 89 Aug 28 tds'; *) type (* begin module interact.type *) string = record (* a string of characters *) letters: array[1..maxstring] of char; (* the letters in the string *) length: integer; (* the number of characters in the string *) current: integer; (* the letter we are working on *) end; (* end module interact.type version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module dbinst.type *) parameters = record (* the set of parameters read from the user *) feature: string; (* the feature to make instructions for *) after: integer; (* the 'to' location *) before: integer; (* the 'from' location *) dobin: boolean; (* true means to write begin file instructions *) doein: boolean; (* true means to write end file instructions *) dooin: boolean; (* true means to write object file instructions *) dosin: boolean; (* true means to write space file instructions *) dosln: boolean; (* true means to write space length file *) dooln: boolean; (* true means to write object length file *) doremove: boolean; (* true means to remove duplicated instructions *) foundanyway: char; (* set fromfound and tofound to true if this is 'f', even if they aren't on the piece in the database. This forces one to get the fragment. Set both to false if it is 's'. This forces nothing to be obtained unless there is a complete piece. *) alignmentshift: integer; (* the amount to add to a coordinate read from the entry to make the alignment the desired one. *) generic: boolean; (* if true write generic instructions *) end; spotlist = ^spot; spot = record (* a location in the entry that has been printed out already *) location: integer; (* the place we printed *) next: spotlist; (* the next one *) end; (* end module dbinst.type *) (* begin module filler.type *) (* the following is an array used to fill a string. it is convenient to have it much shorter than the maxstring, so that it is easy to fill the string using procedure fillstring. the user must declare the value of constant fillermax. *) filler = packed array[1..fillermax] of char; (* end module filler.type version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module trigger.type *) trigger = record (* an object to be searched for *) seek: string; (* the characters looked for *) state: integer; (* how close to triggering we are *) skip: boolean; (* trigger not found- skip the line *) found: boolean (* the trigger was found *) end; (* end module trigger.type version = 'prgmod 4.05 89 Aug 28 tds'; *) var (* begin module dbinst.var *) db, (* a set of GenBank entries *) binst, (* begin of feature instructions *) einst, (* end of feature instructions *) oinst, (* whole feature instructions, from begin to end *) sinst, (* between feature instructions, from end to begin *) olength, slength, (* list of feature lengths *) dbinstp, (* parameters *) locuslist, (* list of loci containing features of interest *) missing: (* Features listed in GenBank comments. Since they are probably not reliable they are not included in binst or einst *) text; freespots: spotlist; (* a list of spots that have already been made *) (* end module dbinst.var *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module halt *) procedure halt; (* stop the program. the procedure performs a goto to the end of the program. you must have a label: label 1; declared, and also the end of the program must have this label: 1: end. examples are in the module libraries. this is the only goto in the delila system. *) begin writeln(output,' program halt.'); goto 1 end; (* end module halt version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module skipblanks *) procedure skipblanks(var thefile: text); (* skip over blanks until a non-blank, or end of line, is found *) begin while (thefile^ = ' ') and not eoln(thefile) do get(thefile); end; procedure skipnonblanks(var thefile: text); (* skip over nonblanks until a blank, or end of line, is found *) begin while (thefile^ <> ' ') and not eoln(thefile) do get(thefile); end; (* end module skipblanks version = 'prgmod 4.05 89 Aug 28 tds'; *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module interact.clearstring *) procedure clearstring(var ribbon: string); (* empty the string *) var index: integer; (* to the ribbon *) begin (* clearstring *) with ribbon do begin for index := 1 to maxstring do letters[index] := ' '; length := 0; current := 0; end end; (* clearstring *) (* end module interact.clearstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module interact.figurestring *) procedure figurestring( var line: string; (* a string of characters to figure out *) var first: integer; (* first found non-blank character in the line *) var last: integer; (* last character before a blank after first *) var whzat: char; (* what the token is *) var c: char; (* the first character of the token *) var i: integer; (* integer value of token if it is integer; or 0 *) var r: real); (* the real value if it is real; or 0.0 *) (* figurestring figures out the tokens in a string. it recognizes words, integers, reals and poorly formed numbers. you can easily use it to parse lines. our goal is to figure out what thing is on a string. start looking at the current place on the line. first and last are the first 'token' in line after start. the current place is updated to the letter after last. the thing found is described by the value of whzat: 'c': character (when the token does not begin with a digit, '+', or '-') 'i': integer 'r': real ' ': blank line 'g': garbage, cannot figure it out and the value of the thing found is the appropriate variable *) var numbers: set of '0'..'9'; sign: integer; (* sign of a number *) numberstart: integer; (* the point a number starts, beyond its sign, if any *) point: integer; (* location of decimal point *) power: integer; (* of 10 representing a place value in the number *) l: integer; (* an index for dissecting numbers *) function figureinteger(first,last:integer):integer; (* figure the integer in the token *) var i: integer; (* index *) sum, increment: integer; begin (* figureinteger *) power:=1; (* start at ones place *) sum:=0; (* start sum at zero *) for i:=last downto first do begin case line.letters[i] of '0': increment:=0; '1': increment:=1; '2': increment:=2; '3': increment:=3; '4': increment:=4; '5': increment:=5; '6': increment:=6; '7': increment:=7; '8': increment:=8; '9': increment:=9 end; sum:=sum+power*increment; power:=power*10 end; figureinteger:=sum end; (* figureinteger *) begin (* figurestring *) numbers:=['0','1','2','3','4','5','6','7','8','9']; (* c:=' '; i:=0; r:=0.0; do not affect these variables unless necessary *) point:=0; whzat := '.'; (* assume that we have someting to work on *) (* now to see if that is true: *) with line do if (length = 0) or (current < 1) or (current > length) then whzat := ' ' else begin (* figure out where the first token is in the line *) first:=line.current; while (line.letters[first]=' ') and (first < line.length) do first:=succ(first); if (first = line.length) and (line.letters[first] = ' ') then whzat := ' '; end; if whzat <> ' ' then begin last:=first; while (line.letters[last]<>' ') and (last < line.length) do last:=succ(last); if line.letters[last] = ' ' then last := pred(last); (* the token is between inclusive first and last *) c:=line.letters[first]; if (c in numbers) or (c in ['+','-']) then begin if c in ['+','-'] then begin case c of '+': sign:=+1; '-': sign:=-1; end; numberstart:=succ(first) end else begin sign:=+1; numberstart:=first end; whzat:='i'; for l:=numberstart to last do begin if not(line.letters[l] in numbers) then if line.letters[l]='.' (* we found a period *) then if whzat='i' (* if so far it is numbers *) then begin whzat:='r'; (* it is actually real *) point:=l end else whzat:='g' (* it is a second '.', ie garbage *) else whzat:='g' (* it is garbage *) end; (* if it is only numbers, it is integer *) (* build number *) (* if it ends in a period, it is integer *) if (whzat = 'r') and (point = last) then whzat:='i'; if whzat = 'i' then begin if point = last (* had an ending decimal point *) then i:=sign * figureinteger(numberstart,pred(last)) else i:=sign * figureinteger(numberstart,last); r:=i end else if whzat = 'r' then begin i:=figureinteger(numberstart,point-1); r:=sign * (i+figureinteger(point+1,last)/power); i:=sign * i end end else begin whzat:='c'; end; (* move the start to just beyond the last character of the token *) line.current:=succ(last) end end; (* figurestring *) (* end module interact.figurestring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module interact.writestring *) procedure writestring(var tofile: text; var s: string); (* write the string s to file tofile, no writeln *) var i: integer; (* index to s *) begin (* writestring *) with s do for i := 1 to length do write(tofile, letters[i]) end; (* writestring *) (* end module interact.writestring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module interact.token *) procedure token(var buffer, atoken: string; var gotten: boolean); (* get a token from the buffer *) var (* variables for calling figurestring: *) first: integer; last: integer; what: char; cha: char; int: integer; rea: real; index: integer; (* to the buffer *) begin figurestring(buffer,first,last,what,cha,int,rea); if what = ' ' then gotten := false else begin clearstring(atoken); for index := first to last do atoken.letters[index-first+1] := buffer.letters[index]; atoken.length := last - first + 1; atoken.current := 1; gotten:=true end end; (* end module interact.token version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module interact.getstring *) procedure getstring(var afile: text; var buffer: string; var gotten: boolean); (* get a string from a file not using string calls. this lets one obtain lines from a file without interactive prompts *) var index: integer; (* of buffer *) begin (* getstring *) clearstring(buffer); if eof(afile) then gotten := false else begin index := 0; while (not eoln(afile)) and (index < maxstring) do begin index := succ(index); read(afile, buffer.letters[index]) end; if not eoln(afile) then begin writeln(output, ' getstring: a line exceeds maximum string size (', maxstring:1,')'); halt end; buffer.length := index; buffer.current := 1; readln(afile); gotten := true end end; (* getstring *) (* end module interact.getstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module equalstring *) function equalstring(a, b: string): boolean; (* test for equality between two strings at current positions *) var index: integer; (* index to both strings *) equal: boolean; (* are letters in a and b the same? *) begin (* equalstring *) if a.length = b.length then begin index := 1; repeat equal := (a.letters[index] = b.letters[index]); index := succ(index) until (not equal) or (index > a.length); equalstring := equal end else equalstring := false end; (* equalstring *) (* end module equalstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module concatstring *) procedure concatstring(var a, b: string; var c: string); (* concatenate a and b and put in c *) var abindex: integer; (* index to both strings *) cindex: integer; (* index to both strings *) begin (* concatstring *) clearstring(c); c.length := a.length + b.length; if c.length > maxstring then begin writeln(output,'string too long in concatstring'); halt end; cindex := 0; for abindex := 1 to a.length do begin cindex := succ(cindex); c.letters[abindex] := a.letters[abindex] end; for abindex := a.length+1 to a.length+b.length do begin cindex := succ(cindex); c.letters[abindex] := b.letters[abindex-a.length] end; end; (* concatstring *) (* end module concatstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module copystring *) procedure copystring(a: string; var b: string); (* copy string a to b *) var l: integer; (* index to the string *) begin b.length := a.length; for l := 1 to a.length do b.letters[l] := a.letters[l] end; (* end module copystring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module makenumber *) procedure makenumber(name: string; var number: integer; var found: boolean); (* make a integer number from the name. If a number was not detected, found is false. *) var l: integer; (* position in the string *) begin found := false; number := 0; for l := 1 to name.length do begin if name.letters[l] in ['0','1','2','3','4','5','6','7','8','9'] then begin found := true; number := number * 10; (* make room for the next digit *) case name.letters[l] of '0': number := number + 0; '1': number := number + 1; '2': number := number + 2; '3': number := number + 3; '4': number := number + 4; '5': number := number + 5; '6': number := number + 6; '7': number := number + 7; '8': number := number + 8; '9': number := number + 9; end end end end; (* end module makenumber version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module filler.fillstring *) procedure fillstring(var s: string; a: filler); (* this procedure makes it reasonably easy to fill the string s with characters. one calls the procedure as: *) (* 1 2 3 4 5 *) (* 12345678901234567890123456789012345678901234567890 *) (* fillstring(s, 'this-is-the-string '); the two comments make it easy to line the characters up. also, for this example, it was assumed that the length of filler as defined by the constant fillermax was 50. *) var length: integer; (* of the string without trailing blanks *) index: integer; (* of s *) begin (* fillstring *) clearstring(s); length := fillermax; while (length > 1) and (a[length] = ' ') do length := pred(length); if (length = 1) and (a[length] = ' ') then begin writeln(output, 'fillstring: the string is empty'); halt end; for index := 1 to length do s.letters[index] := a[index]; s.length := length; s.current := 1 end; (* fillstring *) (* end module filler.fillstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module filler.filltrigger *) procedure filltrigger(var t: trigger; a: filler); (* fill the trigger t *) begin (* filltrigger *) fillstring(t.seek,a) end; (* fillstring *) (* end module filler.filltrigger version = 'prgmod 4.05 89 Aug 28 tds'; *) (* begin module trigger.proc *) (* this module allows one to scan a series of characters, as from an array or a file, and to "trigger" or detect a simple string in the series. the advantage of the trigger is that several triggers can "observe" a stream of characters at once, each looking for a different thing. some other modules required: interact.const, interact.type *) procedure resettrigger(var t: trigger); (* reset the trigger to ground state *) begin (* resettrigger *) with t do begin state := 0; skip := false; found := false end end; (* resettrigger *) procedure testfortrigger(ch: char; var t: trigger); (* look at the character ch. if it is part of the trigger (at the current trigger state), then the trigger state goes higher. if it is not part of the trigger then the trigger state is reset, skip is true and one should skip onward to find the trigger. if the trigger is found, found is true. *) begin (* testfortrigger *) with t do begin state := succ(state); (* if debugging then begin writestring(list,seek); writeln(list,'testfortrigger seek.letters[',state:1,']:', seek.letters[state],' ch:',ch); end;*) if seek.letters[state] = ch then begin skip := false; if state = seek.length then found := true else found := false end else begin (* reset trigger *) state := 0; skip := true; found := false end end end; (* testfortrigger *) (* end module trigger.proc version = 'prgmod 4.05 89 Aug 28 tds'; *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module nearstring *) function nearstring(a, b: string; n: integer): boolean; (* test for equality between two strings at current positions only over a region of n characters *) var index: integer; (* index to both strings *) equal: boolean; (* are letters in a and b the same? *) begin (* nearstring *) if (a.length > 0) and (b.length > 0) then begin index := 1; repeat equal := (a.letters[index] = b.letters[index]); index := succ(index) until (not equal) or (index > a.length) or (index > b.length) or (index > n); end else if (a.length = 0) and (b.length = 0) then equal := true else equal := false; nearstring := equal end; (* nearstring *) (* end module equalstring version = 'prgmod 4.05 89 Aug 28 tds'; *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module dbinst.writeparameters *) procedure writeparameters(var tofile: text; par: parameters); (* write the paramters to file tofile *) begin with par do begin write(tofile,'* searching for '); writestring(tofile,feature); writeln(tofile); if generic then writeln(tofile,'* from before to after') else writeln(tofile,'* from ',before:1, ' to ',after:1); if dobin then writeln(tofile,'* begin instructions written'); if doein then writeln(tofile,'* end instructions written'); if dooin then writeln(tofile,'* object instructions written'); if dosin then writeln(tofile,'* space instructions written'); if dooln then writeln(tofile,'* object lengths written'); if dosln then writeln(tofile,'* space lengths written'); if (dobin or doein) then if doremove then writeln(tofile,'* duplicate binst or einst instructions', ' and objects (etc) removed') else writeln(tofile,'* DUPLICATE INSTRUCTIONS will NOT be REMOVED', ' from binst or einst'); if foundanyway = 'f' then writeln(tofile,'* found-anyway is on (f): coordinates off the piece', ' will be written anyway'); if foundanyway = 's' then writeln(tofile,'* found-anyway is off (s): coordinates off the piece', ' will NOT be written.'); writeln(tofile,'* alignment shift: ',alignmentshift:1); end; end; (* end module dbinst.writeparameters *) (* begin module dbinst.readparameters *) procedure readparameters(var dbinstp: text; var par: parameters); (* read the aprameters from dbinstp *) var c: char; (* a character in dbinstp *) gotten: boolean; (* was the string gotten fromt file? *) buffer: string; (* a line from dbinstp *) begin with par do begin (* FIRST PARAMETER: get the feature *) reset(dbinstp); getstring(dbinstp,buffer,gotten); if not gotten then begin writeln(output,'empty dbinstp'); halt end; (* extract the feature from the line *) token(buffer, par.feature, gotten); if not gotten then begin writeln(output,'missing first parameter'); halt end; (* SECOND PARAMETER: from-to *) if eof(dbinstp) then begin writeln(output,'missing from-to parameters'); halt end; readln(dbinstp,before,after); if before > after then generic := true else generic := false; (* THIRD PARAMETER: instruction file booleans *) if eof(dbinstp) then begin writeln(output,'missing beos instruction file parameters'); halt end; read(dbinstp,c); dobin := (c = 'b'); read(dbinstp,c); doein := (c = 'e'); read(dbinstp,c); dooin := (c = 'o'); read(dbinstp,c); dosin := (c = 's'); readln(dbinstp); (* FOURTH PARAMETER: length file booleans *) if eof(dbinstp) then begin writeln(output,'missing os instruction file parameters'); halt end; read(dbinstp,c); dooln := (c = 'o'); read(dbinstp,c); dosln := (c = 's'); readln(dbinstp); (* FIFTH PARAMETER: remove duplicate instruction boolean *) if eof(dbinstp) then begin writeln(output,'missing remove instruction file parameter'); halt end; if dbinstp^ = 'r' then doremove := true else doremove := false; readln(dbinstp); readln(dbinstp, foundanyway); readln(dbinstp, alignmentshift); end end; (* end module dbinst.readparameters *) (* begin module dbinst.sign *) function sign(l:integer): char; (* Produce the sign that Delila requires in front of a number to be added. Zero is given as +0 *) begin if l >= 0 then sign := '+' else sign := '-' (* if l > 0 else if l < 0 then sign := '-' else sign := '+'; *) end; (* end module dbinst.sign *) (* begin module dbinst.writeinst *) procedure writeinst(var f: text; var didpiece: boolean; locusname: string; var featurename: string; (* mmm *) first, last: integer; par: parameters; complement: boolean; var number: integer); (* Write to f the Delila instruction: get from first +before to last +after; Write the piece instructions if didpiece is false. Use a generic form if generic is true. Write the complementary form if complement is true. Number the get with the number variable. *) begin with par do begin number := number + 1; if not didpiece then begin writeln(f); write(f,'piece '); writestring(f,locusname); writeln(f,';'); didpiece := true; end; write(f,'name "'); (* mmm *) writestring(f,featurename); (* mmm *) write(f,'"; get'); (* mmm *) if complement then if generic then write(f,' from ', first:1, ' -after', ' to ', last:1, ' -before') else write(f,' from ', first:1, ' ',sign(-before),abs(before):1, ' to ', last:1, ' ',sign(-after ),abs(after ):1) else if generic then write(f,' from ', first:1, ' +before', ' to ', last:1, ' +after') else write(f,' from ', first:1, ' ', sign(+before), abs(before):1, ' to ', last:1, ' ', sign(+after), abs(after):1); write(f,' direction '); if complement then write(f,'-') else write(f,'+'); write(f,'; (* ',number:1); if complement then writeln(f,' COMPLEMENT *)') else writeln(f,' *)'); end; end; (* end module dbinst.writeinst *) (* begin module dbinst.writelength *) procedure writelength(var afile: text; fromnumber, tonumber: integer; locusname: string; complement: boolean; number: integer); (* write to a file afile the length information given the from and to coordinates. Reverse the order if complement is true, so that only positive lengths are produced. *) var length: integer; (* length of te object *) { holdnumber: integer; (* a place to hold an integer while switching numbers *) } begin { if complement then begin holdnumber := fromnumber; fromnumber := tonumber; tonumber := holdnumber; end; } if complement then length := fromnumber - tonumber + 1 else length := tonumber - fromnumber + 1; (* this is possible - overlapping genes! if length < 1 then begin writeln(output); writeln(output, '* WARNING: possible program error!'); write (output, '* A length was less than 1 in '); writestring(output,locusname); writeln(output); writeln(output,'* length = ',length:1 ); writeln(output,'* fromnumber = ',fromnumber:1 ); writeln(output,'* tonumber = ', tonumber:1 ); writeln(output,'* (from procedure writelength)'); {zzz} writeln(output); end; *) write(afile,length:8); write(afile,' '); write(afile,(fromnumber):8); write(afile,' '); write(afile,(tonumber):8); write(afile,' '); writestring(afile,locusname); write(afile,' ',number:5); writeln(afile); end; (* end module dbinst.writelength *) (* begin module dbinst.instheader *) procedure instheader(var afile: text; inuse: boolean; kind: char; par: parameters); (* Write the Delila instructions, title and version comment. If kind is: 'b' then write for the beginning of the feature 'e' then write for the ending of the feature 'o' then write for the entire feature (object) 's' then write for the space between features If inuse is true, the the file is in use, write to it. If false, it is not in use. Wipe it out. *) begin (* rewrite(afile); used to be here, but this means that all files get rewritten, and so all files would be created whether or not the program writes on them. *) if inuse then begin rewrite(afile); (* create the title instruction *) write(afile,'title "'); writestring(afile,par.feature); case kind of 'b': write(afile,' beginning'); 'e': write(afile,' ending'); 'o': write(afile,' object'); 's': write(afile,' space'); end; if par.generic then write(afile,' from before to after') else write(afile,' from ',par.before:1, ' to ',par.after:1); writeln(afile,'";'); writeln(afile,'(* dbinst ',version:4:2,' *)'); writeln(afile,'default out-of-range reduce-range;'); writeln(afile,'default numbering piece;'); writeln(afile); writeln(afile,'(* The parameters used were:'); writeparameters(afile,par); writeln(afile,'*)'); end; end; (* end module dbinst.instheader *) (* begin module dbinst.lengthheader *) procedure lengthheader(var length: text; inuse: boolean; kind: char; par: parameters); (* write the header information out to the length file for each kind of length file. If inuse is true, the the file is in use, write to it. If false, it is not in use. Wipe it out. *) begin (* rewrite(length); used to be here, but this means that all files get rewritten, and so all files would be created whether or not the program writes on them. *) if inuse then begin rewrite(length); writeln(length,'* dbinst ',version:4:2); case kind of 'o': write(length,'* object'); 's': write(length,'* space'); end; writeln(length,' lengths'); (* writeparameters(length,par); *) (* create the Delila title instruction *) writeln(length,'*-'); writeln(length,'** MARKS BEGINNING OF DELILA INSTRUCTIONS'); write(length,'* title "'); writestring(length,par.feature); case kind of 'b': write(length,' beginning'); 'e': write(length,' ending'); 'o': write(length,' object'); 's': write(length,' space'); end; if par.generic then write(length,' from before to after') else write(length,' from ',par.before:1, ' to ',par.after:1); writeln(length,'";'); writeln(length,'* (* dbinst ',version:4:2,' *)'); writeln(length,'* default out-of-range reduce-range;'); writeln(length,'* default numbering piece;'); writeln(length,'*-'); writeln(length,'* (* The parameters used were:'); writeparameters(length,par); writeln(length,'* *)'); writeln(length,'** MARKS ENDING OF DELILA INSTRUCTIONS'); writeln(length, '*-'); writeln(length, '* COLUMN identification:'); writeln(length, '* length, first-position, last-position, piece-name,', ' number'); end; end; (* end module dbinst.lengthheader *) (* begin module dbinst.die *) procedure die; (* something is very wrong *) begin writeln(output, 'can''t locate an expected part of the db'); halt end; (* end module dbinst.die *) (* begin module dbinst.orgchr *) procedure orgchr(var inst: text; name: string; stars: boolean); (* write the organism and chromosome instructions out. If stars is true, then write '* ' at the start of each line. *) begin if stars then write(inst,'**'); writeln(inst); if stars then write(inst,'**'); write(inst,'organism '); writestring(inst,name); write(inst,';'); write(inst,' chromosome '); writestring(inst,name); writeln(inst,';'); end; (* end module dbinst.orgchr *) (* begin module dbinst.recordspot *) procedure recordspot(spot: integer; var list, free: spotlist); (* add the spot to the list, using the free list if possible *) var p: spotlist; (* for holding a spot temporarily *) begin (* get a pointer p pointing to a fresh spot record *) if free = nil then begin new(p); p^.next := nil end else begin p := free; free := free^.next end; (* put the number into p *) p^.location := spot; (* put p onto the list *) if list = nil then begin list := p; p^.next := nil (* no connections left *) end else begin p^.next := list; list := p end end; (* end module dbinst.recordspot *) (* begin module dbinst.dumpspots *) procedure dumpspots(var list, free: spotlist); (* dump the list onto the free list *) var p: spotlist; (* for holding a spot temporarily *) begin while list <> nil do begin p := list; list := list^.next; p^.next := free; free := p end end; (* end module dbinst.dumpspots *) (* begin module dbinst.onlist *) function onlist(list: spotlist; number: integer): boolean; (* if the number is on the list, return true *) var p: spotlist; (* for holding a spot temporarily *) found: boolean; (* true if it was found *) begin p := list; found := false; while (p <> nil) and (not found) do begin if p^.location = number then found := true; p := p^.next end; if found then onlist := true else onlist := false end; (* end module dbinst.onlist *) (* begin module dbinst.showspots *) procedure showspots(var thefile: text; list: spotlist); (* show the list to thefile *) var p: spotlist; (* for holding a spot temporarily *) begin p := list; while p <> nil do begin write(thefile,' ',p^.location:1); p := p^.next; if p = list then begin writeln(thefile,'infinite loop!'); halt end; end; writeln(thefile) end; (* end module dbinst.showspots *) { the old skipstring (* begin module dbinst.skipstring *) procedure skipstring (buffer: string; var position: integer; flag: char); (* skipstring increments the counter 'position' from its initial value until the 'flag' is found within the 'buffer'. *) var snumber: integer; begin while (buffer.letters[position] <> flag) and (position < maxstring) do begin write(output,'|'); writestring(output,buffer); writeln(output,'| SKIPPING!!!'); for snumber := 1 to buffer.current do write(output,' '); writeln(output,'^ length =',buffer.current:1,' position=',position:1); position := position + 1; end; end; (* end module dbinst.skipstring *) } (* begin module dbinst.skipstring *) procedure skipstring (var buffer: string; flag: char); (* skipstring increments the counter 'position' from its initial value until the 'flag' is found within the 'buffer'. *) { var snumber: integer; } begin while (buffer.letters[buffer.current] <> flag) and (buffer.current <= buffer.length) do begin { write(output,'|'); writestring(output,buffer); writeln(output,'| SKIPPING!!!'); for snumber := 1 to buffer.current do write(output,' '); writeln(output,'^ buffer.current =', buffer.current:1,' buffer.current=',buffer.current:1); } buffer.current := buffer.current + 1; end; { write(output,'|'); writestring(output,buffer); writeln(output,'| situation after SKIPSTRING!!!'); for snumber := 1 to buffer.current do write(output,' '); writeln(output,'^ buffer.current =', buffer.current:1,' buffer.current=',buffer.current:1); } end; (* end module dbinst.skipstring *) (* begin module dbinst.readanumber *) procedure readanumber(buffer: string; var current, number: integer; var found: boolean); (* reads a number in from the buffer string. NO NEGATIVE NUMBERS ALLOWED. This is because GenBank numbers are supposed to be all positive... (wouldn't be hard to allow though) *) function inttochar(letter: char): integer; begin case letter of '0': inttochar:=0; '1': inttochar:=1; '2': inttochar:=2; '3': inttochar:=3; '4': inttochar:=4; '5': inttochar:=5; '6': inttochar:=6; '7': inttochar:=7; '8': inttochar:=8; '9': inttochar:=9 end; end; begin number := 0; found := false; while buffer.letters[current] in ['0','1','2','3','4','5','6','7','8','9'] do begin found := true; number := (number * 10) + inttochar(buffer.letters[current]); current := current + 1; end; end; (* end module dbinst.readanumber *) (* ************************************************************************ *) (* ************************************************************************ *) (* begin module dbinst.themain *) procedure themain(var db, binst, einst, oinst, sinst, olength, slength, dbinstp, missing: text); (* the main procedure of the program. The old variables are used to keep track of the spaces between objects. The last variables are used to keep track of the last instruction written so that duplicate instructions can be removed. *) const debugging = false; (* for debugging *) var featurewanted: boolean; (* true if the last identifier was the feature we are looking for: par.feature *) atoken: string; (* a token from buffer *) buffer: string; (* a line from db *) bnumber: integer; (* the number of before instructions *) comment: string; (* the GenBank COMMENT identifier string *) complement: boolean; (* true if the feature is on the complement *) { current: integer; (* where we are in the buffer *) } didbpiece: boolean; (* true if the piece instruction has been written to the beginning (binst) file *) didepiece: boolean; (* true if the piece instruction has been written to the ending (einst) file *) didopiece: boolean; (* true if the piece instruction has been written to the object (oinst) file *) didspiece: boolean; (* true if the piece instruction has been written to the space (sinst) file *) done: boolean; (* done finding the end of a note *) endentry: string; (* the end identifier for an entry *) enumber: integer; (* the number of end instructions *) feature: string; (* the GenBank FEATURE identifier string *) featurefound: boolean; (* true if the last identifier was a feature, and false if it was a comment *) fromfound: boolean; (* the from position was found *) { fromstring: string; (* the numeric string for the from position *) } fromnumber: integer; (* the integer corresponding to fromstring *) listfrom: spotlist; (* a list of the from locations found *) listto: spotlist; (* a list of the to locations found *) lownumber: integer; (* the lower of tonumber and fromnumber *) genus: string; (* the genus name *) genusspecies: string; (* the constructed genus.species name *) strain: string; (* a strain name [2006 Oct 25] *) holdstring: string; (* a string for holding intermediate results. *) dash: string; (* a strain name [2006 Oct 25] *) gotten: boolean; (* was the string gotten from the file? *) less: string; (* a string contining a "<" *) locus: string; (* the locus identifier to search for *) locusname: string; (* the locus name *) more: string; (* a string contining a ">" *) note: string; (* the GenBank note identifier string *) gene: string; (* the GenBank gene identifier string *) featurename: string; (* the name of the feature extracted mmm*) oldgenusspecies: string; (* the previous genusspecies string *) oldtofound: boolean; (* true if tofound was true for the last feature *) oldtonumber: integer; (* the previous value of tonumber if oldtofound is true *) onumber: integer; (* the number of object instructions *) organism: string; (* the organism identifier to search for *) par: parameters; (* the set of parameters read from the user *) period: string; (* a string contining a period *) spacecomplement: boolean; (* true if the space object is on the complement *) species: string; (* the species name *) snumber: integer; (* the number of space instructions *) tellaboutfeaturemissing: boolean; (* if true, tell the name of the entry of the missing feature *) tofound: boolean; (* the to position was found *) { tostring: string; (* the numeric string for the to position *) } tonumber: integer; (* the integer corresponding to tostring *) onemore: boolean; (* read in one more line mmm *) i: integer; (* reading in the gene name letters mmm *) (* begin THEMAIN *) begin writeln(output,'dbinst ',version:4:2); rewrite(missing); writeln(missing,'dbinst ',version:4:2, ' MISSING INSTRUCTIONS'); writeln(missing,'These loci contain features that are in the COMMENT'); writeln(missing,'of an entry. They are not listed in binst or einst.'); writeln(missing); reset(db); rewrite(locuslist); writeln(locuslist,'dbinst ',version:4:2, ' LOCUS LIST '); writeln(locuslist,'These are the loci that contain features of interest'); writeln(locuslist); (* set some variables *) didbpiece := false; didepiece := false; didopiece := false; didspiece := false; featurewanted := false; featurefound := false; onumber := 0; snumber := 0; listfrom := nil; listto := nil; freespots := nil; bnumber := 0; enumber := 0; tellaboutfeaturemissing := false; {zzz} oldtonumber := 0; fromfound := false; tofound := false; (* 1 2 3 4 5 *) (* 12345678901234567890123456789012345678901234567890 *) (* fillstring(locus, 'LOCUS ');*) fillstring(locus, 'ACCESSION '); fillstring(organism,'ORGANISM '); fillstring(comment, 'COMMENT '); fillstring(feature, 'FEATURES '); fillstring(note , '/note=" '); fillstring(gene , '/gene=" '); fillstring(endentry,'// '); fillstring(period ,'. '); fillstring(less ,'< '); fillstring(more ,'> '); fillstring(dash ,'- '); clearstring(oldgenusspecies); readparameters(dbinstp,par); writeparameters(output,par); (* force all output files to be something or blank to avoid confusion *) instheader(binst,par.dobin,'b',par); instheader(einst,par.doein,'e',par); instheader(oinst,par.dooin,'o',par); instheader(sinst,par.dosin,'s',par); lengthheader(olength,par.dooln,'o',par); lengthheader(slength,par.dosln,'s',par); { writeln(output); writeln(output); writeln(output); writeln(output); } (* begin to scan the database db to find the features *) clearstring(buffer); while not eof(db) do begin (* each pass through this loop represents looking at one line in the db file. A large number of booleans are set to keep track of where we are in the database. *) getstring(db,buffer,gotten); (* extract a line from db *) (* if we have a line, continue *) if gotten then begin token(buffer, atoken, gotten); if gotten then begin (* we have a line identifier, handle it! *) (* LOCUS *) { if equalstring(atoken,locus) then writeln(output,'locus found'); } if equalstring(atoken,locus) then begin token(buffer, locusname, gotten); if not gotten then die; didbpiece := false; didepiece := false; didopiece := false; didspiece := false; featurewanted := false; (* reset the recording for the end of the previous object so that we can tell that there could not have been an inter-object space next time *) oldtofound := false; oldtonumber := 12345678; (* use a value that is easy to detect *) (* keep track of all from's and to's to find duplicates *) dumpspots(listfrom,freespots); dumpspots(listto,freespots); (* If we change the code above, we get the method to allow the program to pick up the spaces before the first object. The code for spaces after the last one is something like: oldseqlength := -1; then later on: if (oldseqlength >0) and (oldtofound) then put out oldlocusname; @) oldtofound := true; oldtonumber := 1; (@ make it the start of the sequence *) end; (* ORGANISM *) { if equalstring(atoken,organism) then writeln(output,'organism found'); } if equalstring(atoken,organism) then begin token(buffer, genus, gotten); if not gotten then die; token(buffer, species, gotten); if not gotten then die; (* force genus to be genuslimit characters long *) if genus.length > genuslimit then genus.length := genuslimit; concatstring(genus,period,atoken); concatstring(atoken,species,genusspecies); (* 2006 Oct 25 grab anymore stuff and attach it: strain names, separated by dashes. *) gotten := true; while gotten do begin token(buffer, strain, gotten); if gotten then begin if debugging then begin writestring(output,genusspecies); writeln(output, ' genusspecies'); writestring(output,strain); writeln(output, ' strain'); end; concatstring(dash,strain,atoken); (* overwrite atoken *) if debugging then begin writestring(output,dash ); writeln(output, ' dash'); writestring(output,atoken); writeln(output, ' atoken'); end; concatstring(genusspecies,atoken,holdstring); copystring(holdstring,genusspecies); if debugging then begin writestring(output,genusspecies); writeln(output, ' genusspecies'); writeln(output, ' BUBBA'); end; end; {qqq} end; writestring(output,genusspecies); writeln(output, ' genus.species-strain'); if debugging then halt; if not equalstring(genusspecies,oldgenusspecies) then begin (* beginning *) if par.dobin then orgchr(binst,genusspecies,false); (* ending *) if par.doein then orgchr(einst,genusspecies,false); (* object *) if par.dooin then orgchr(oinst,genusspecies,false); (* space *) if par.dosin then orgchr(sinst,genusspecies,false); if par.dosln then orgchr(slength,genusspecies,true); if par.dooln then orgchr(olength,genusspecies,true); copystring(genusspecies,oldgenusspecies); didbpiece := false; didepiece := false; didopiece := false; didspiece := false; end; end; { write(output,'|'); writestring(output,buffer); writeln(output,'| buffer as seen before entering notes'); for snumber := 1 to buffer.current do write(output,' '); writeln(output,'^ current =',buffer.current:1); } (* DAMN NOTES *) if nearstring(atoken,note,7) then begin (* skip the damn thing, as it can contain words like "intron" which we DON'T want to detect!! *) (* since this is a TOKEN, it is not parsed correctly. Force the buffer back to the start of the buffer: *) buffer.current := 1; (* now we can locate the first quote mark precisely: *) skipstring(buffer,'"'); (* step past it: *) buffer.current := buffer.current + 1; (* now locate the other end. The note can cover several lines, so we may need to repeat: *) repeat { the old call: skipstring(buffer,buffer.current,'"'); } skipstring(buffer,'"'); done := (buffer.letters[buffer.current] = '"') or eof(db); { if done then writeln(output,'DONE DONE DONE'); write(output,'|'); writestring(output,buffer); writeln(output,'| this is CRAZY!!!'); for snumber := 1 to buffer.current do write(output,' '); writeln(output,'^ current =',buffer.current:1); if done then writeln(output,'done!'); } if not done then getstring(db,buffer,gotten); (* extract a line from db *) until done; {A POSSIBILITY FOR SOLUTION: } clearstring(buffer); (* just to be safe *) clearstring(atoken); (* just to be safe *) { writeln(output,'END OF DAMN NOTES'); } end; (* determine if we are inside a COMMENT or a FEATURE *) if equalstring(atoken,feature) then featurefound := true; { if equalstring(atoken,feature) then writeln(output,'feature found'); } if equalstring(atoken,endentry) then featurefound := false; { if equalstring(atoken,endentry) then writeln(output,'end of entry found'); } if equalstring(atoken,comment) then begin featurefound := false; tellaboutfeaturemissing := true; end; (* If we are inside a comment and we have found the feature we were looking for, record it in the missing file only. *) if equalstring(atoken,feature) and not featurefound then begin if tellaboutfeaturemissing then begin write(missing,'entry '); writestring(missing,locusname); writeln(missing); tellaboutfeaturemissing := false; end; writestring(missing,buffer); writeln(missing); end; (* FEATURE *) (* We ONLY pick up features in the official GenBank FEATURE section. If there are "untranslated" features in the COMMENT section, we drop them because they are not reliable. Example: in the July 1989 database, the entry HUMNMYC3 has a comment with an IVS. The start of this IVS is listed as the start of the sequence, but this is clearly wrong, given the sequence that is there. To avoid this kind of error we drop such sequences. *) (* read in the gene name bmmm *) if nearstring(atoken,gene,7) and onemore then begin (* mmm *) buffer.current := startspot; skipstring(buffer,'"'); i := 1; featurename.letters[i] := buffer.letters[buffer.current + i]; while((featurename.letters[i] <> '"') and (i < buffer.length)) do begin i := i + 1; featurename.letters[i] := buffer.letters[buffer.current + i]; end; featurename.length := i-1; end; (* emmm *) if featurefound then begin buffer.current := startspot; (* if we are not on the same line as the feature name, then ignore the line we are on and skip to the next one *) (* now read the next line with the gene name *) if (equalstring(atoken,par.feature)) then begin(* bmmm *) featurewanted := true; onemore := true; end else if(onemore) then begin (* read the gene name *) onemore := false; featurewanted :=true; end(* emmm *) else featurewanted := false; end else featurewanted := false; (* if we have the correct feature, read the coordinates *) if featurewanted then (* mmm *) if ( onemore = true ) then begin (* mmm *) (* detect special cases *) fromfound := false; tofound := false; if buffer.letters[startspot] = 'c' then complement := true else complement := false; (* this section does stuff ONLY if there is a complement *) if complement then begin (* if it is a complement, then it is necessary to skip some text to get to the coordinates *) buffer.current := startspot; skipstring(buffer,'('); buffer.current := buffer.current + 1; (* check to ensure that the feature is on the sequence *) if buffer.letters[buffer.current] = '<' then begin buffer.current := buffer.current + 1; tofound := false; end else tofound := true; (* read the number *) if tofound then readanumber(buffer,buffer.current,tonumber,tofound); if tofound then begin (* find the from *) (* skip over the dots between numbers *) while buffer.letters[buffer.current] = '.' do buffer.current := buffer.current +1; (* check to ensure that the feature is on the sequence *) if buffer.letters[buffer.current] = '>' then begin buffer.current := buffer.current + 1; fromfound := false; end else fromfound := true; (* read the number *) if fromfound then readanumber(buffer,buffer.current,fromnumber,fromfound); (* We modify the coordinates according to the alignmentshift: *) fromnumber := fromnumber - par.alignmentshift; tonumber := tonumber - par.alignmentshift; end end (* this section does the work when there is no complement *) else begin (* check to ensure that the feature is on the sequence *) if buffer.letters[buffer.current] = '<' then begin buffer.current := buffer.current + 1; fromfound := false; end else fromfound := true; (* read the sequence *) if fromfound then readanumber(buffer,buffer.current,fromnumber,fromfound); if fromfound then begin (* find the to *) (* skip over the dots between numbers *) while buffer.letters[buffer.current] = '.' do buffer.current := buffer.current +1; (* check to ensure that the feature is on the sequence *) if buffer.letters[buffer.current] = '>' then begin buffer.current := buffer.current + 1; tofound := false; end else tofound := true; (* read the number *) if tofound then readanumber(buffer,buffer.current,tonumber,tofound); (* We modify the coordinates according to the alignmentshift: *) fromnumber := fromnumber + par.alignmentshift; tonumber := tonumber + par.alignmentshift; end end; end (* mmm *) else begin (* mmm *) (* NOTE!!! Checks for duplicates are all following this point, ie, using the shifted alignment. The recording of duplicates is also done with shifted numbers. Just so long as one doesn't try to use the unshifted numbers before this point, it's ok. *) { writeln(output); if fromfound then write (output,'| found fromnumber: ',fromnumber:4,' ') else write (output,'| missing fromnumber'); if tofound then writeln(output,'| found tonumber: ',tonumber:4,' ') else writeln(output,'| missing tonumber'); } (* now that we have the coordinates from the database entry, we know which of them were found, and we can decide what to write out based on that. *) if par.foundanyway = 'f' then begin fromfound := true; tofound := true; end else if par.foundanyway = 's' then begin (* this forces nothing to be found if either was not found: *) if (not fromfound) or (not tofound) then begin fromfound := false; tofound := false; end; end; (* if objects are being printed, then they must correspond to the individual instructions. So zap the instructions if the objects are not allowed. Note that this zaps the objects themselves also, because they depend on the fromfound and tofound variables. *) if fromfound and tofound then if par.dooin then if onlist(listto ,tonumber) or onlist(listfrom,fromnumber) then begin fromfound := false; tofound := false; end; (* SPACE PROBLEM: I'm not absolutely sure that this does the right thing for spaces: shouldn't it use the old to number? But that's what the code was, from when it worked! *) if fromfound and oldtofound then if par.dosin then if onlist(listto, tonumber) or onlist(listfrom,fromnumber) then begin fromfound := false; tofound := false; end; { writeln(output,'| DECISION:'); if fromfound then write (output,'| found fromnumber: ',fromnumber:4,' ') else write (output,'| missing fromnumber'); if tofound then writeln(output,'| found tonumber: ',tonumber:4,' ') else writeln(output,'| missing tonumber'); } (***********************************************************) (* objects and spaces *) (* Note that we must check for duplicates on the onlist, so that duplicate objects and spaces don't get written out. This means that if EITHER END of the object is duplicated, it is NOT written. To make sure that they are at least written out ONCE, the objects and spaces are done BEFORE the instructions. *) if fromfound and tofound { and (not onlist(listto,tonumber)) and (not onlist(listfrom,fromnumber)) } then begin (* record the object *) if par.dooin then writeinst(oinst, didopiece, locusname, featurename, (* mmm *) fromnumber, tonumber, par, complement, onumber); if par.dooln then writelength(olength, fromnumber, tonumber, locusname, complement, onumber); end; if fromfound and oldtofound { and (not onlist(listto,tonumber)) and (not onlist(listfrom,fromnumber)) } then begin (* record the space *) (* work out the effective "complement" case to be positive *) spacecomplement := fromnumber < oldtonumber; (* determine the lower number *) if fromnumber < tonumber then lownumber := fromnumber else lownumber := tonumber; {writeln(output,'1 oldtonumber ',oldtonumber:1); zzz} if par.dosin then writeinst(sinst, didspiece, locusname, featurename, (* mmm *) {zzz oldtonumber, fromnumber, } oldtonumber, lownumber, par, spacecomplement, snumber); {writeln(output,'2 oldtonumber ',oldtonumber:1); zzz} if par.dosln then writelength(slength, {zzz oldtonumber, fromnumber, } oldtonumber, lownumber, locusname, spacecomplement, onumber); {writeln(output,'3 oldtonumber ',oldtonumber:1); zzz} end; (***********************************************************) (* instructions *) if (fromfound or tofound) then if not (didbpiece or didepiece) then begin writestring(locuslist,locusname); writeln(locuslist); end; if par.dobin then if fromfound and not onlist(listfrom,fromnumber) then begin writeinst(binst, didbpiece, locusname, featurename, (* mmm *) fromnumber, fromnumber, par, complement, bnumber); if par.doremove then recordspot(fromnumber,listfrom, freespots); end; if par.doein then if tofound and not onlist(listto,tonumber) then begin writeinst(einst, didepiece, locusname, featurename, (* mmm *) tonumber, tonumber, par, complement, enumber); if par.doremove then recordspot(tonumber,listto, freespots); end; (***********************************************************) (* update the oldtofound number only if there is a new tofound number in the tospot column *) if buffer.letters[tospot] <> ' ' then oldtofound := tofound; { if tofound then oldtonumber := tonumber; } if tofound then {zzz use fromnumber if it is larger!} begin {zzz writeln(output,'BUBBA in tonumber update'); writeln(output,'BUBBA tonumber ',tonumber); writeln(output,'BUBBA fromnumber ',fromnumber); } if tonumber > fromnumber then oldtonumber := tonumber else oldtonumber := fromnumber; end; {zzz tonumber depends on the orientation of the gene} (* update the oldfromfound number only if there is a new fromfound number in the fromspot column *) (* if buffer.letters[fromspot] <> ' ' then oldfromfound := fromfound; if fromfound then oldfromnumber := fromnumber; *) end end end end end; (* end module dbinst.themain *) (* ************************************************************************ *) begin themain(db, binst, einst, oinst, sinst, olength, slength, dbinstp, missing); 1: end.