Genes – the language of God 3: Why genes aren’t information

Genes are more commonly regarded as information than as a language, and in fact the informational metaphor underpins the language metaphor. In this post I will consider how genes came to be called information (that is, how the Dawkins view of genes as computer messages came to the fore), and what it can and cannot mean.

In The Blind Watchmaker (1986), Richard Dawkins compared DNA to computer programs (instructions for building organisms):

It is raining DNA outside. … [downy seeds from willow trees] The cotton wool is mostly made of cellulose, and it dwarfs the tiny capsule that contains the DNA, the genetic information. The DNA content must be a small proportion of the total, so why did I say that it was raining DNA rather than cellulose? The answer is that it is the DNA that matters… whose coded characters spell out specific instructions for building willow trees… It is raining instructions out there, it’s raining programs; it’s raining tree-growing, fluff spreading, algorithms. That is not a metaphor, it is the plain truth. It couldn’t be any plainer if it were raining floppy disks. [Chapter 5, p 111]

Floppy disks have been superseded by USB thumb drives, but the point is clear enough – DNA is information, not just a molecule. It’s not a metaphor.

However, many have tried to make this “plain truth” work, and failed. There are many reasons for this, but first let us look into the history of the idea that DNA is information.

As I noted in the first post of this series, the notion that inheritance is about information long precedes the discovery of DNA, let alone its structure and role in inheritance. But the idea that DNA is information goes back to the two discoverers of how DNA was structured, Francis Crick and James Watson. At first, back in 1952, the structure did not give the way DNA made proteins; it took some time to figure this out. In 1958, Crick published what came to be known as the “Central Dogma” of genetics:

Central Dogma Crick 1970 combined

[From Sandwalk’s excellent essay on the Central Dogma.] On the left Crick diagrammed all the possible ways sequence information could be passed on between DNA, RNA and proteins. DNA could copy itself, pass sequential information to RNA molecules or to proteins or all three; and the same was true for the other two types of molecules. In fact, Crick said, it only is passed on according to the right hand graph. Later, we discovered that some RNA sequences can be reverse transcribed into DNA, especially through the medium of what are now called retroviruses. Crick gave the following definition of the Central Dogma:

… once (sequential) information has passed into protein it cannot get out again.

It is very important to note that the “information” here is the linear sequence of the base pairs matching up to a linear sequence, first of RNA (tRNA), and then later of the proteins (through intermediary molecules of mRNA). Nothing beyond this is implied by the Central Dogma, and we can usefully call this “Crick information”, as Griffiths and Stotz do in their book. The passing of sequential or Crick information is thus a kind of templating from a sequence in the DNA to the [often edited] sequence in the RNA to the finished protein. It is not as “instructions” that Crick posited information. You lose nothing if you drop the word “information” in favour of “structure”, and I will argue there are good reasons for this.

When Crick was writing, information was all the rage. In 1948, the so-called Communications Theory of Information, made mathematical by Claude Shannon at IBM, was published, and many scientists thought this was a fruitful way to approach scientific problems. Inheritance seemed like a transmission of information, and so it was natural that Shannon’s scheme would be brought to bear. However, it was ultimately rather fruitless.

Another information idea, coincidentally published the next year by Norbert Wiener, is called Cybernetics. Here the information is about control of one thing by another, through signals. Cybernetic ideas about genes have been more fruitful, but in the end they turn out to be just analogies that are not terribly deep (in my opinion).

The code aspect of genes: what it is and isn’t

Code language is widely used when talking about how DNA causes proteins. Terms like editing, reading, transcribing, and expressing are all used in the technical literature. DNA is “expressed”, and “edited”; a gene is regarded as an “open reading frame”; DNA is “copied” or “replicated”. Such terms point up the leading property of DNA – it is both long lasting and its structure can be duplicated, not unlike a document. For this reason, some scientists refer to genetics as a “codical domain”.

But what is happening physically is that DNA molecules are split into two strands by helicases, and then either transcribed by polymerases, and RNA made from it, or that new DNA is made. The DNA and the RNA are just as physical as the proteins they produce. As Weiner noted in his book:

Information is information, not matter or energy. No materialism which does not admit this can survive at the present day. [p132]

Following Weiner here, DNA is a physical structure, and it is not “information” in the sense used by communications or computation theories. That sort of information is an abstract entity, a property of mathematics, not physics. Genes are not that kind of information. A mathematical model of genetics, especially population genetics which describes how genes change in populations, contains information about genes, but that’s a different kind of information too; it isn’t what those who say genes are information mean by it.

So the Crick information model – that genes are templates for the structure of RNA and through them of proteins – seems to be the only meaningful sense in which one can say genes are information.

Other types of information in genes

There are some other senses in which genes are supposed to have an informational aspect. They are the program sense, and the game theory sense.

Program/recipe: genetic control versus genetic involvement

The program or receipt metaphor has been used by many evolutionary biologists, including Ernst Mayr and Richard Dawkins. It is used in Dawkins’ quote above: genes are instructions. There can be no doubt that genes are involved, either directly or indirectly (say, by building molecules that have functions) in the development of living things. They are “first among equals”. But how can they be “instructions”?

Recall the mnemonic

G & E -> O

from the last post. In order for genes to be instructions, there would need to be a “computer” to “run” the instructions (or in the case of a recipe metaphor, a cook and kitchen to make the recipe). What could do this for genes? It would need to be not only the cellular machinery that expresses genes – ribosomes and so forth – but also the organism itself, which turns on and turns off genes, and the environment that provides the source material. So the mnemonic would have to become

G & O[<tE -> O[t]

or, the genes G, together with the state of the organism before now O[<t], together with the environment E, gives the organism now O[t]. While this is true enough, the metaphor no longer seems to hold up. Why not just say that genes and organisms and the environment gives the later organism? There is no temptation to talk about some abstract program, and ascribe to genomes powers they do not have.

Incidentally, while the Human Genome Project delivered the entire genome in 2000 (it’s been revised a bit since), we have yet to discover what sorts of effects most of the expressed genes actually have, and it will probably be another century before we finish that. And of course most of the genome is unused junk.

Game theory: genes as bookkeepers

There is one final metaphor that is possibly more than a metaphor that we should look at. It is yet another view that is found in Richard Dawkins’ work: genes are strategies in a game. Here the metaphor is backed up by extensive mathematics: a field known as “game theory”, developed to deal with Cold War threats and counter threats, turns out to be very useful to model how genetics changes in certain conditions (when the fitness of genes and their propensity to work together to against each other within a single population are known).

This was the basic underlying metaphor of The Selfish Gene: genes have interests, and behave (evolutionarily) like self-interested players of a game known as The Prisoner’s Dilemma. The details are not important here.

Game theory treats genes as “players” or agents. But genes have no strategies themselves; it is just that the mathematics of games can transfer to genetics. This often happens, that mathematics developed for one field get used in other fields. It doesn’t mean that the properties of that first field (where game players are rational and selfish) apply to the new field, only that the maths applies.

In fact, the game theory view has been called by Stephen J. Gould a “bookkeeping” view of evolution; you track the “wins” and “losses” of a given gene in a mathematical scorecard. In other words, selfish genes exist only in how you record the outcomes of the evolving population. It’s useful, but it doesn’t mean genes actually are strategies, nor that they have them.

Next I will discuss why genes are not a language.

Further information:

Molecular Biology (Stanford Encyclopedia of Philosophy)

Biological information (Stanford Encyclopedia of Philosophy)

A video on epigenetics:

58 thoughts on “Genes – the language of God 3: Why genes aren’t information

  1. “It is very important to note that the ‘information’ here is the linear sequence of the base pairs matching up to a linear sequence, first of RNA (tRNA), and then of the proteins.”

    Instead of “tRNA”, it appears you should have written “mRNA” (messenger RNA), i.e., the type of RNA that is translated into a polypeptide (“protein”). It’s true that tRNA is transcribed from DNA, but it is not “then” translated into a sequence of amino acids.

    “But what is happening physically is that DNA molecules are split into two strands by either ribosomes, and RNA modules made from it, or that new DNA is made by polymerases, large molecules that assemble RNA monomers when DNA is passed through a cleft in the molecule.”

    Some confusion here; this sentence should perhaps be reworked. For one thing, a double-stranded DNA helix is split by the enzyme DNA helicase (assisted by single-strand binding proteins), not by ribosomes, which (in eukaryotes anyway) do not have access to DNA because they’re in the cytoplasm, not in the nucleus. Also, “new DNA” wouldn’t be made by polymerases that assemble “RNA monomers” — new DNA has to be made from DNA monomers.

    “Following Weiner here, DNA is a physical structure, and it is not ‘information’ in the sense used by communications or computation theories.”

    Certainly, information is an abstract concept. But any time we communicate information, we do it using physical structures — paper and ink, or USBs and magnetism, or whatever. Information and the physical have to go together, even though they can be conceptually distinguished.

    “. . . we have yet to discover what sorts of effects most of the expressed genes actually have. . . . And of course most of the genome is unused junk.”

    Since most of the genome is “expressed” (i.e., transcribed at least), how can most of it be “unused junk”? Especially since, as you say, we still have a lot of discovering to do?

    1. Thanks. I don’t know what happened there. I have amended it (since this is for students; and I had better not get the details wrong).

      Yes, information is expressed and stored physically; but as a concept it is an abstract property. There is a notion – “physical information” – in the recent literature (NB: not the physicist’s definition; the philosopher’s), but so far as I can tell it just means “causal structure” and I prefer to use that terminology instead. Information is too mystical for this physicalist.

      As to why I take this line, this is a position post for discussion. It is, of course, my opinion – whose else should I express?

      Finally, the evidence is in that most of the genome is indeed junk; no more than (according to what I have read) 8% has any function in the organism. The go-to guy for this is Larry Moran at Sandwalk; Larry is a leading biochemist and right up on the literature. His takedowns of the ENCODE project are epic. [He is not the only one who does them, of course.]

  2. Larry Moran is knowledgeable, but I think he and others probably have the wrong end of the stick here. John Mattick is the Australian geneticist, now director of the Garvan Institute, who has championed the importance of non-coding DNA for many years. His comment on the criticisms of the ENCODE project can be found
    here.

    Of more interest are recent papers he has coauthored:

    A meta-analysis of the genomic and transcriptomic composition of complex life ((the full text of which can be found via google), his commentary on several recent papers demonstrating functional effects of long non-coding RNA genes, and a paywalled review in Nature Reviews Genetics, where he comments

    Loci that express lncRNAs show all of the hallmarks of bona fide genes, including conservation of promoters, indicative chromatin structure, and regulation by conventional morphogens and transcription factors. Moreover, lncRNAs were found to have a similar range of cellular half-lives as mRNAs and to be differentially expressed in a tissue-specific manner, especially in the brain. The study in the brain showed that, although the expression levels of many lncRNAs seem to be lower than those of mRNAs in whole tissues, lncRNAs are highly expressed and easily detectable in particular cell types. In addition, lncRNAs were found to have, on average, higher cell specificity than proteins; this is consistent with their proposed role in architectural (as opposed to ‘cell-type’) regulation, in which each cell has a unique positional identity in precisely sculpted organs, bones and muscles

    Turning back to the concept of the gene and how it relates to information: “gene” is notoriously slippery, since we use the same word for a locus, for a polymorphism, and for particular alleles. But your disparaging of the “program sense” of gene as a recipe or instruction is, I think, unjustified – nobody has has any problem with the fact that information technology deals with electronic, mechanical or hydraulic substrates, and that recipes are algorithms for manipulating physical ingredients, usually to produce a more complex and structured outcome.

  3. This is my first comment here, so apologies if it comes across as unduly negative. But this seems hopelessly vague to me, and it seems to root in your admission that “Information is too mystical for this physicalist.”

    I want to be more constructive. So here’s an example:

    G & O[<t] E ? O[t]

    Let G be a computer program, O[<t] is the state of the memory of the computer, E is the computing machinery itself (the world of influence over the computation), and O[t] is the state of the memory at the end.

    G is redundant, right? We can include G in the state of the memory of the computer at time <t.

    So are computer programs usefully to be thought of as programs? Your desire to throw out the idea of a program should apply to a wide class of other systems that operate either on a) the medium in which the program is encoded, or b) the machinery on which the program is run (or both).

    There are lots of other issues that jumped out on me:

    Information Theory has proven fruitless as a way of understanding DNA and RNA? Are you really wanting to wave away the field of bioinformatics as fruitless? As a basis of all kinds of highly commercial endeavours like protein synthesis, drug discovery, DNA sequencing.

    Game strategies and information are *alternative metaphors* for understanding DNA? Really? For DNA to be a strategy assumes it is information. There is no 'bookkeeping' needed for this to work. No 'record keeping' is needed. One can simulate the process with mathematical models that keep tallies, but that's not how they work, and nobody has suggested it is, have they? The presence of genes in the population is the scorecard, and it is stochastic. Seems like you're constructing a strawman here.

    "You lose nothing if you drop the word “information” in favour of “structure”" Of course one can just change words without affecting anything: we lose nothing if we change "structure" to "foobar". But we do lose significant amounts of the mathematics of biology if we decide to drop the concept of "information" and try to rebuild our scientific toolbox purely on "structure".

    I'm not aware of any actual biologists who would see the 'structure' or 'information' of DNA as separate things to decide which is the more productive way of viewing what is happening in the cell. Seems like trying to worry about whether it is more fruitful to see cars as mobile objects or vehicles for passengers, when trying to figure out the traffic for a city.

    Overall it just seems like you have a strangely mystical view of what information is and how it works, or else you're making more of a semantic argument, either way I think you do a disservice to the students you aim to teach if you plan to go in on this tack and teach them that information theory isn't a useful way to deal with genetics.

    1. Exactly so. John wan’t always so down on bio-information. For example, here’s his earlier definition of a meme: “The least unit of socio-cultural information relative to a selection process that has favourable or unfavourable selection bias that exceeds its endogenous tendency to change” – J.S. Wilkins, ?1998. One wonders what happened.

      1. Well cultural information runs via information processing systems (or their very close analogues) without a doubt. Genes, do not. So Williams’ definition works well for memes, but a lot less so for genes.

        1. Because brains are “information processing systems” and cells are not? To me, that doesn’t make sense. Cells are “information processing systems” (in the sense of Shannon information) just as much as brains or telephone networks are. The requirements for being an “information processing system” seem pretty trivial to me: some kind of input, transformation and output. Cells definitely qualify. For a more borderline case, you would have to go to something like a rock.

            1. Shannon want into extensive discussion on the topics of error correction and error detection. These are widely applicable to such systems – and inside cells there is definitely error correction and error detection equipment. Further, there are formal treatments of these mechanisms in the existing literature on the topic. See, for instance, “Information theory and error-correcting codes in genetics and biological evolution” – by Gerard Battail.

              1. Well, except that there are no evolved mecahnisms to “detect” them, right? Forget my question Jonh…

    2. On the state of the “computer” including the program: yes, that is a good point, but one which undercuts the idea that any part of the organism is itself the program. A Universal Turing Machine is instantiated in a computer: an analogy with organisms and genetic programs would mean that the genetic program is the organism (including genes). Genes would then not be privileged as “the” instructions or programs; they would be one kind among many.

      As to the bioinformatics issue, it is you who are confused. Bioinformatics analyses large data sets (measurements) of the organism’s parts (including genes, of course, but also proteins, etc.). Genes are no more information because bioinformatics is used to analyse them than people are statistics because statistics is used to analyse populations of people. Of course bioinformatics uses information.

      I was referring to Gould’s claim that game theoretic accounts are genes-as-bookkeeping, not that they were alternative accounts. Many of these information-based views of genes are used simultaneously, and are jointly coherent. The bookkeeping is done by geneticists, not genes.

      “Structure” has a meaning, “foobar” does not (as any programmer knows). If I say that something has structure, I am saying something meaningful and measurable. If I say it has information, however, I must specify against what probability distributions I am measuring it, and there seems not to be any such privileged distribution in genes. In fact, as I understand it, each nucleotide is equally probable, physically.

      On the perspective issue, I used to work in city planning (as an administrator, not a city planner) and they actually do distinguish between vehicles as moving objects (that have to work within a carrying capacity of the road system) and as passenger vehicles (shared cars versus single occupant vehicles versus mass passenger busses, etc.). It matters how you characterise things, in detail.

      Information is a symbolic, mathematical quantity. Genes are physical things. As Weiner said, information is information, not mass or energy, and the reverse is true.

      1. John, Thanks for the reply, I’m glad I didn’t come across too trollish.

        So, on the ‘instantiation’. I literally have no idea how this analogy works. Who talked about ‘Universal Turing Machines’ and what is an abstract model of computation to do with the computation of a program? You’re layering other concerns now. We have no abstract model of genetic processing, as we have no abstract model of cookery, or metallurgy or many other things. Even if ‘Universal Turing Machines’ were the key factor in talking about whether a program is a program (which seems very dubious, Turing computation is not the only kind of computation), I still can’t figure out how this would means the genetic program is the organism. The point of the analogy is that a program is not separate from the thing being computed, nor the state (and hence behavior) of the thing doing the computing. But a program is still a meaningful way of understanding it.

        The bioinformatics issue was specifically addressing your claim that treating genetic data as information proved fruitless. This is refuted by the fact that a whole, economically essential, field exists that treats genetic data as information, and in fact relies on it, methodologically. I’m quite aware of what bioinformatics is, it was my PhD area, and I used a large number of information theoretic tools to analyse the genetic data, as do others in the field. Granting this does not, of course, establish that genes are information, but it does refute your point that treating them as information proved fruitless.

        “Information is a symbolic, mathematical quantity. Genes are physical things. As Weiner said, information is information, not mass or energy, and the reverse is true.”

        or

        “Information is a symbolic, mathematical quantity. Books are physical things.” or any other such thing. Appealing to the physicality of genes doesn’t make your point, does it?

        I mean, it might make the point that genes are not information in the sense that the terms ‘genes’ and ‘information’ refer to the same thing, nor that ‘information’ is a set of things to which ‘genes’ wholly belongs. But that’s not what people mean when they talk about genes being information any more than when they talk about books being information.

        1. “I mean, it might make the point that genes are not information in the sense that the terms ‘genes’ and ‘information’ refer to the same thing, nor that ‘information’ is a set of things to which ‘genes’ wholly belongs. But that’s not what people mean when they talk about genes being information any more than when they talk about books being information.”

          I agree, and that’s what I was driving at when I said earlier: “Information and the physical have to go together, even though they can be conceptually distinguished.”

          John, is it possible we could simply agree to refrain from saying, “Genes are information,” but to instead say, “Genes contain information.”

          After all, when “read” by RNA polymerase during “transcription,” the DNA “code” “determines” the sequence of the mRNA nucleotides. And when the mRNA “codons” (containing “information” from the DNA, posttranscriptionally “edited” by the spliceosome) are “read” by the ribosome during “translation,” the mRNA “specifies” which tRNA molecules enter the ribosome and thereby “dictates” the amino acid sequence of the polypeptide.

          So then, why not allow, “Genes contain information”?

          1. Everything “contains information”. “Genes contain information” is a trivial and vacuous thing to say.

            IMO, an informational genetics that is broad enough to encompass organic and cultural evolution is best based on informational genes. Shared heritable information is what forms the basis of shared traits. This has been the best shot at a general science of heredity that I have seen.

            If you don’t use the concept of “heritable information” you generally wind up with a narrow, parochial notion of a “gene” that’s unsuitable for forming the basis of a general science of heredity. In a nutshell, that is why evolutionists need to use informational genes.

            1. “Everything” contains information?? In what sense of the word “information”?

              Genes contain information in the sense of coded instructions that are read by transcriptional machinery, turned into corresponding mRNA, and then into a functional amino acid sequence. Which strikes biologists as “information” in a highly significant, fascinating sense. (Not something you see every day in “everything.”)

              1. Which is why I accept the use of structural (Crick) information – it is something directly relevant and significant when talking about genes. It is implied by the Central Dogma, and captures everything about genes that the science requires.

              2. In the sense of Shannon’s use of the word ‘information’. “Coded instructions” is not what the term ‘information’ means in standard information theory.

      2. A common approach is to use a maximum entropy distribution. That’s an appropriate distribution to use if you know nothing about which alternative is most likely. In genetics, this corresponds to the assumption that each base pair is equally likely. If you find yourself wondering what the appropriate priors are in an information-theoretic discussion of gene sequences, you should normally try using this one.

  4. The transcription apparatus binds (weakly) to almost anything. It’s not specific; it’s not intelligently designed. Lots of useless RNA is constantly produced and soon destroyed again.

    1. I think your comments may be based on outdated information.

      “Early work on transcription in mammalian cells identified hnRNA, a heterogeneous population of huge nuclear RNAs that were short-lived. . . . the pendulum of scientific opinion has now swung away from the idea that much of this RNA could be ‘transcriptional noise’ or junk RNA transcribed from junk DNA. . . .” (Thomas R. Cech and Joan A. Steitz, “The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones.” Cell 157:87, March 27, 2014)

      That major review article also states: “Today, the ncRNA revolution has engulfed all living organisms, as deep sequencing has uncovered the existence of thousands of long (l)ncRNAs with a breathtaking variety of roles in both gene expression and remodeling of the eukaryotic genome.” (p. 77)

      Furthermore, “Notwithstanding the fact that there are definable classes of ncRNAs that work by similar principles (e.g., tRNAs, riboswitches, miRNAs), it could be argued that every ncRNA studied has a different function. Certainly no two mammalian lncRNAs appear to have the same function. Thus, with perhaps 10,000 lncRNAs yet to be studied in the human genome alone, it seems safe to predict that many new functions of ncRNAs will be identified—perhaps thousands of functions.” (p. 89)

      Time and again, the charge that a biological feature is “useless” has been found to be premature.

  5. Information is completely subjective – structure is not. If one has the sensory apparatus to decode structure, then and only then does something become informative.

    1. If you want objectivity while using Shannon’s concept of information, you can always specify a reference observer. A common one is the universal prior (i.e. use Solomonoff’s Algorithmic Probability). There are also computable versions (the time-bounded “Levin” prior or the speed prior). The “information is subjective” argument against Shannon information doesn’t make sense: objectivity is available if you want it.

  6. This is a matter of definition of the term “gene”. Many have *defined* genes as consisting of information – most famously as follows:

    “In evolutionary theory, a gene could be defined as any hereditary information for which there is a favorable or unfavorable selection bias equal to several or many times the rate of endogenous change” – Williams 1966, page 25.

    “A gene is not a DNA molecule; it is the transcribable information coded by the molecule” – Williams 1992, page 11.

    I think any argument on the topic has to be about whether these kinds of definition are useful.

  7. Classically, the difference between hardware and software is that the software is easier to change than the hardware. Genetic programs are regularly changed at runtime by retroviruses. IMO, the genes-as-software, cells-as-hardware perspective looks pretty good from this perspective.

    As for organisms requiring environments while computers do not – that just seems like bunk: computer programs depend on their environments too.

    Oh, yes, and there’s more to computer programs than “instructions”. They also contain data. Genes->instructions seems like a bit of a straw man. When Dawkins said: “it is raining instructions out there” he didn’t mean that all genes should be interpreted as being “instructions”.

    1. So when he says that the “important bits” of these seeds are instructions, he didn’t mean that genes, which are the important bits according to the book, are instructions? Of course he did.

      1. And how else is one to think of viruses other than as sets of instructions that subvert a general purpose system that can produce arbitrary protein and DNA outputs?.

          1. Like the molecular package currently typing this message in a standard physical chemical fashion? Why not apply the correct level model, and use it to make predictions about the high level characteristics of the physical system.

            http://www.cs.bgu.ac.il/~sipper/selfrep

            Von Neumann used two-dimensional CAs [cellular automata] with 29 states per cell and a neighborhood consisting of 5 cells (the neighborhood consists of the cell itself together with its four immediate nondiagonal neighbors). He showed that a universal computer can be embedded in such cellular space, namely, a device whose computational power is equivalent to that of a universal Turing machine. He also described how a universal constructor may be built, namely, a machine capable of constructing, through the use of a “constructing arm,” any configuration whose description can be stored on its input tape. The universal constructor is therefore capable, given its own description, of constructing a copy of itself, i.e., of self-replicating. Note that terms such as machine and tape refer to configurations of CA states – indeed the ability to formally describe such structures served as a major motivation for von Neumann’s choice of the CA model. It has been noted that the basic mechanisms von Neumann proposed for attaining self-replication in cellular automata bear strong resemblance to those employed by biological life, discovered during the following decade.

      2. I don’t think Dawkins *did* say that. The bit in quotation marks looks like a misquotation to me – and a web search for the phrase turns up nothing relevant. You have a better quality quotation from Dawkins relating to what matters further up the page: “why did I say that it was raining DNA rather than cellulose? The answer is that it is the DNA that matters”.

    2. The whole computer analogy is banal, trite, a cliché. Where exactly are the “software” and the “hardware”? Organisms are not even remotely similar to human constructions. Wouldn’t it be better to describe things as they actually are – rather than resort to cheap metaphors that only confuse?

      1. At the risk of stating the obvious, the idea that DNA software and phenotype hardware is useful to people who know more about hardware and software than they do about cells and DNA. It is not a “cheap metaphor” that “only confuse[s]”. It’s a useful idea that helps people to visualize how cells work. Yes, some philosophy students *do* get confused, but that always seems to happen – no matter what you say.

      2. “Organisms are not even remotely similar to human constructions.”

        You’re right, organisms are orders of magnitude more complicated than anything designed by humans.

        Our brightest scientists, with the best equipment and funding, can’t even begin to create an operational single cell, let alone a whole organism.

        Information technology tycoon Bill Gates stated, quite appropriately, “Human DNA is like a computer program but far, far more advanced than any software ever created.”

        1. Irrelevant comment as usual Richard, but what else could one expect?

          “Human DNA is like a computer program but far, far more advanced than any software ever created.”

          But DNA is not even remotely similar to a computer program. This just shows how little Bill Gates knows about biology. What happens inside cells is a bunch of chemical reactions – chemical reactions are not information, instructions, software or anything like them. You should know something about chemical reactions, shouldn’t you?

          1. Nice to hear from you, Michael. Love you too.

            So Richard Dawkins, who compared DNA to floppy disks, knows nothing about biology either?

            “… a bunch of chemical reactions….”

            Not random reactions, though. Specified ones, right? Specified by the nucleotide base sequence in the DNA “code.”

            If the DNA “code” “specifies” the mRNA sequence, and thence the amino acid sequence, to me that looks similar enough to “instructions.” I’ll stand with Bill Gates on this one.

            Yes, I do know something about chemical reactions. And about biology.

  8. ” DNA is a physical structure, and it is not “information” in the sense used by communications or computation theories”.

    This is the confusion that Shannon tried to eliminate. If you look at the pits in a CD then you do not see music either. But that misses the point. If the pits are correctly interpreted, then the music appears. Information carried by a message is the reduction of uncertainty by the receiver about which of the possible messages the transmitter has chosen to transmit. If there are e.g. 8 possible messages then three bits are needed to indicate which one is transmitted, See Shannon. Does 0100011 constitute a message? Maybe, depends on whether transmitter and receiver have a common understanding of what this code points to. For another pair of transmitter and receiver the same code may point to something completely different. The message is not the information. It is only an index to a commonly agreed semantic system. That is all.

    Thus DNA is not information, it is a message. And the message is physical, just like the pits on the surface of a CD.

  9. I’m a bit confused by all this. If use these intertubes and my laptop to create a sequence of characters

    AUG UAC AAU GCC AUG GAG,

    have I created some information? If I synthesize an oligonucleotide of RNA corresponding to that sequence, have I stored that information in a storage device? If I send that oligonuceotide to an analytical biochemist who can elucidate the structure structure of the oligonucleotide, have I sent her the information in the sequence above?
    Ultimately, there are many types of information present everywhere. There is sequence information present in all linear polymers. But I don’t know about the difference between containing information and being information.

    1. @Roger: if you send the message AUG UAC AAU GCC AUG GAG to me and I know what this sequence refers to (e.g. the sequence might mean: today I am out of office) then I have received information, if I did not know that today you would be out of office before I received the message. That is all. A message does not contain information, it is merely a pointer to a semantic context that both of us have agreed on. Any chemical may be such a pointer, that depends on the context. If a CO2 detector goes off, then information is obtained, with the triggering CO2 molecule being the message. That is Shannon’s insight: information does not reside in the message itself. This is confusing, as the semantic context is often only implicit, which gives the erroneous impression that the message contains the information. If I read a newspaper, then I assume that the words have their usual meaning, although I cannot be 100% sure of this. Therefore I say that the newspaper contains information, although formally this is not a correct statement. An oligonucleotide neither is nor contains information. It is a message because we know in which context it performs its function.

      1. I confess I disagree. The notion that information requires a sender is flawed in the same sense that arguments about the 2nd law of thermodynamics are exclusive to closed systems is flawed.
        Information exists in everything. Usually too much to be dealt with. And it changes dynamically (some interpretations of quantum mechanics suggest less specific information is present until committed by the interactions of observation but let’s bypass that for now).
        Sequence information exists but the context that renders that information into a message differ. A great deal more information exists in an ensemble of actual DNA polymers but we generally ignore that information. It is less coherent(terminology chosen with malice aforethought) and _contextually_ less significant.
        The information in the message I sent you, in the form of an ensemble of chemical structures, exists independently of your ability to recognize it. Useful information, and contextual information, are distinct from raw information. But information exists independently of ability or skill to recognize or capitalize upon it.

        1. There are several different and distinct ideas of “information” in play here. Shannon information requires a sender. Dretske information doesn’t. Fisher information doesn’t even require a receiver (just a measurement device). Chaitin/Kolmogorov requires both a sender and receiver and a compressor.

          My point is that one cannot simply use a word – information – and leave it at that. What you offered up is Crick information: the structure of an RNA sequence (I infer). But the symbols on your computer are not RNA. They are ASCII. For it to be interpretable as RNA requires exactly what you say it doesn’t – a receiver (an informed reader). For it to be RNA requires a molecular synthesiser like this: http://www.bioautomation.com/.

          So a reader/receiver is not only implied in your use of the symbols, it is required. You don’t see that because you can’t disambiguate the symbols A, U, C and G from the things, which is exactly the philosophical error I have been arguing against here and elsewhere.

          1. I agree that for “it to be interpretable as RNA” requires a receiver, but that does not mean that for it to exist as information requires an interpreter.

            I still ask, when did the information first _exist_?
            Does its existence require a intent to create it?
            Does its existence require an ability to read it?

            It is certainly possible to model a thing that is information that exists independently of intent or comprehension. It seems to me that this is a superior notion of information as a foundation. Every other model of information can be built upon this physical concept of information, even if it is less intuitive.

              1. “is called … structure” … Why?
                My framework is chemistry and in particular a statistical mechanics/quantum mechanics model of reality. No doubt there is information in structure, and information beyond structure (bond connectivity, stereochemistry) including vibrational, rotational and electronic excitation states (electron densities and nuclear densities subsumed). That we typically ignore, or otherwise find this added information beyond our interests or keen is entirely beside the point.

                This is a reproducible aspect information. A vocal message includes so much more information than the transcript of the words, but we typically ignore that added information. Similarly, a actual molecule, or ensemble of molecules. contains much more information than the canonical polymer sequence, but it does contain the canonical polymer sequence irrespective of sender or receiver. This model works.

              2. However words are used in a given science, the fact is that the information to which you refer is just the structure of the things referred to. It doesn’t acquire the status of “information” in any meaningful sense until a communication or cognitive system is involved.

              3. Information without a receiver is an oxymoron. However, in any scientific activity there’s *always* a receiver – namely, the scientist.

                We don’t need to find another term on the grounds that there’s no observer – because there *is* an observer. There’s always an observer in all the cases that anyone has ever cared about.

              4. Conventionally, “structure” isn’t a kind of observer-free information. Information is quantified in bits, while “structure” has no standard units. If you want observer-independent information, the easiest thing to do is to specify a reference observer.

              5. Objective structure exists in the world (ex hypothesi), but there is too much of it to describe entirely; any description is observer and scale and measurement relative. One presumes God has the full specification, but we aren’t God and so partial descriptions of structure are the best we can hope for.

                Hence we are both right. An observer is required for any description, but the structure is objectively there, in the absence of an observer (other than God – we might be Berkleyan idealists about this 🙂 ).

  10. Reboot. ‘it does not acquire the status of information until a cognitive or communicative sense exists’. This is the fundamental disagreement.

    There’s a concept of information where it exists in a book even if it gets shelved on a library shelf and is never opened. It exists independent of the language used to encode the message. The actual information includes the font and obscure
    typesetting artifacts as well as arcane factoids about the paper, ink, and myriad other factors. Most of these are beyond our keen.

    There is also an aspect of the information that exists regardless of whether or not the message is rot13 encoded. The typically recognized/relevant information content is quantitatively many orders of magnitude less than the actual information content in my sense of information.

    Information content is not “acquired”, it is recognized, or not. Relevance and utility are distinct issues. Anchoring _information_ in physical states have profound advantages. But we need to recognize that the information we trade in is a very small level of signal on top of a huge reservoir of noise. And the fascinating thing is the mechanisms we have for distinguishing certain types of informational signal on top of what is contextually informational noise.

    I promise not to continue this beyond this post so grant you the last word should you be so inclined.

Leave a Reply