1995 Feb 26 Here are some of my responses to Hubert Yockey's book: @book{Yockey1992, author = "H. P. Yockey", title = "Information Theory in Molecular Biology", publisher = "Cambridge University Press", address = "Cambridge", isbn = "0-521-35005-0", comment = "40 West 20th Street, New York, N. Y. 10011-4211, order number 350050", phone = "1-800-827-7423", price = "price as of 1994 October 31: \$74.95", year = "1992"} These comments are given in a constructive spirit. I hope that Hubert will be able to incorporate some or all of them into the second edition of his book. PROLOGUE First, so that people won't get the wrong idea from my little pickings below, I want to say that the introductory material is generally very readable and makes many excellent points about science. Theoretical work is often misunderstood by mainstream biologists, even though the example of theoretical versus experimental physics is well known. The proper use of theory in molecular biology is still a ways off. page 4: "We must distinguish clearly between axioms and the theorems from which they are derived." This should be reversed: "We must distinguish clearly between theorems and the axioms from which they are derived." since theorems are derived from axioms. page 4: "information may be transferred ... DNA to protein." What is an example of this? Translation of single stranded DNA in vitro? page 5: "The question is: can information be transferred from a source with a 20-letter alphabet to a receiver with 64 letters or code words?" Though this is early in the book, the answer is clearly yes: so long as the information content of the 20 letter alphabet is less than or equal to that of the 64 letter alphabet. page 6: "information per symbol, H" - no this is the uncertainty. As I've said before, to call this the information leads to confusion. page 7: "... the biological information system must be able to accommodate the genetic messages of all organisms that have lived in the past, are now living or may live in the future." This can't be correct. The system either works or the organism dies. It CANNOT plan for the future. Indeed, it cannot not even accommodate the past, since it only works here-now, as in Zen. page 7: Hubert is discussing von Neumann's suggestion to Shannon to call his measure "entropy". This, as we all know, has caused a ruckus ever since. It seems to me that it was actually a mistake of von Neumann to say that, as the units are wrong. page 7: "The Maxwell-Boltzmann-Gibbs entropy of thermodynamics is based on the probability of a selection of elements different from that of Shannon and the two have no relation." ... unless the probabilities they both use refer to the same thing! page 8: "It is on this theorem [capacity] that the ability of the genetic message to preserve for so many million years the information needed to form a coelacanth is based." This would seem to imply to me that the genome of the coelacanth is coded for preservation. We don't know of any such codes. It seems more likely to me that the explanation is that this organism has been "lucky" enough to live in an environment which has been unchanging for this time. The other aspect of this sentence is the implication that the genome didn't change. We don't really know that. We know that the fish looks the same, but we don't (yet??) have DNA samples to compare the old with the recent. One might have expected a lot of neutral drift if the shape of the organism was being held in place by selection. On the other hand, Yockey's sentence would imply that some (amazing!) coding mechanism is keeping the entire genome unchanged. The latter seems unlikely to me. page 8: "The exact location of the various atoms that compose these informational molecules is unimportant and merely clutters our thinking." The exact meaning of this sentence is frequently misunderstood by molecular biologists. The point here is that one can learn a great deal merely by looking at the numbers of the states of a system, without looking at the detailed structure. Molecular biologists are so pre-occupied with structure that they often miss interesting details because of this. On the other hand, every information system has a physical basis, and this can influence the method of coding. The simplest case of this which I am aware of is the preponderance of protein contacts that use up to 2 bits of information from the major groove, and the limitation of informational contacts from the minor groove to 1 bit of information (in B-form DNA). (See: Papp et al JMB 233:219-230, 1993.) The reason for this depends greatly on the physical locations of atoms in DNA. In that case, we were quite surprised to find an "exception". It's a good example of building a simple, reasonable theory and then looking for exceptions to learn new things. The trouble is, in the current anti-theory climate of biology the exceptions are misunderstood and used to dismiss the theory before anything has been learned. page 9: Although Maddox has called many times for theory in biology, he rejected one of my papers that did exactly what he was asking for. He is not supportive of theoretical biology. (Actions speak louder than words.) page 11: "However, when theories are based on fundamental principles and not on _ad_hoc_ scenarios, the error is in not taking them seriously enough. Jaynes (1957a) points out that it is when theories fail to predict the results of experiment that they are most useful. Such discrepancies alert us to new knowledge. When a theory predicts correctly we are simply puzzle solving and confirming what we already know (Kuhn, 1970)." BRAVO! More molecular biologists should read this!! CHAPTER I page 25: "Reasoning from axioms is the highest form of human thought." This reminds me of the following wonderful quote: ******************************************************************************** The proof may seem to be unsatisfying: each step is correct, and hence the conclusion is true, but it is not clear why the steps are there and where they came from. That is because there are at least eight levels of mathematical understanding, and it is hard for someone on a lower level to appreciate what goes on at a heigher level. The levels are, I think: 1. Being able to do arithmetic. 2. Being able to substitute numbers in formulas. 3. Given formulas, being able to get other formulas. 4. Being able to understand the hypotheses and conclusions of theorems. 5. Being able to understand the proofs of theorems, step by step. 6. Being able to _really_ understand the proofs of theorems: that is, seeing why the proof is as it is, and comprehending the inwardness of the theorem and its relation to other theorems. 7. Being able to generalize and extend theorems. 8. Being able to see new relationships and discover and prove entirely new theorems. Those of stuck on level 5 can no more understand the workings of a level 8 mind than a cow could understand calculus. Elementary Number Theory, 2nd ed, page 103-104, by Underwood Dudley. ******************************************************************************** (Does anybody know the year of publication and whether it is still in press?) page 32: Equation 1.7 seems to have an error. The middle part of the equation has 4 additive terms. The third one should probably be a sum (Sum from r = 1 to n-3, I think). page 33: 7th line from top appears to be a typo. "expanding (p1+p2+pk)n" should probably be "expanding (p1+p2+...+pk)^n". I imagine this is a consequence of Hubert not typesetting the equations himself using TeX. page 33: just below (1.9): "Equation (1.9) ..." should be "Expression (1.9) ..." since there is no equality given. page 41: There is a reference to figure 1.2, but that is way far away on page 51. It should be on the same page. Also, it seems to refer to the wrong figure, as CUG is mentioned, but that is in figure 1.3, not 1.2. page 43: The kind of stochastic matrix should be mentioned right at the start. As Dudley says, at level 5 (which we need to be to follow the text here! ;-) the steps have to be clear. More text as to the reasoning of the steps would be useful. For example, just before equation (1.42) "Consider the equation" is useless, but "by definition of matrix multiplication" would be much more helpful to the reader. Two lines later "Then" (a waste word) could be replaced by "by (1.42)". The following line is justified by "since sum_k pkj = 1". page 50: "UGA is an absorbing state because it is a termination codon, therefore any transition to that state cannot lead to another." The text here has gotten so involved in the math that it has lost touch with the biology. The transition diagram of figure 1.2 appears to be for mutations between bases of a codon. It isn't clear where the codon is (on a particular mRNA? All mRNAs?). But if the transitions are about mutations, then there will be some codons that can perfectly well mutate to a stop codon: especially near the end of a protein or when it is not a coding frame at all. My point is this: mutation is very different from translation. In translation the stop codon means stop, while by mutations it is not an "absorbing state". Motion ALONG an mRNA is NOT the same as changes within a single condon on an mRNA. Also, the transitions are given in RNA (with U's), but mutational changes would, in most organisms, occur in the DNA. The implication here is changes in an RNA virus. I suggest that the example be based on a realistic model to avoid freaking out molecular biologists. (On page 55 is "in generating a protein sequence", but the method is not clear: translation? Be evolution? Here the implication is that the transitions are from one codon to the next, but then there would be no constraints as shown in the figures.) page 52: The mathematically correct conclusion that the system will ultimately end up in the absorbing state is biologically weird. I have no idea what this math is modelling. Probability of creating various peptides? Then why are there mutations in particular places in the codons? (This is the same problem I pointed out above about the strangeness of the example given.) page 53: Equation (1.64) gives a transition matrix that does not match the transition graph of figure 1.3. According to the matrix, all mutations are allowed, but a number of them are missing from the figure for example, from AUG to GUG. Such transitions may be more common than the ones allowed by the figure. (This is related to exercise 9.) page 54: Although I agree that the mathematics gives the exact result directly, the computation is not so bad on modern computers. I wrote a simple transition matrix multiplication program in a few minutes. The matrix gives 0.25000000 in every position by 17 steps, and this took about 0.03 seconds on a sparcstation 20/61. I will eventually get to later parts of the book. Tom Schneider National Cancer Institute Laboratory of Mathematical Biology Frederick, Maryland 21702-1201 toms@ncifcrf.gov http://www-lmmb.ncifcrf.gov/~toms