Testing universal common ancestry

A long time ago, a young graduate student wandered into the festering cesspool of creationists and evolutionists known as talk.origins and offered to write a FAQ (Frequently Asked Questions page) on whether or not macroevolution and common descent were supported by evidence. I had previously published a philosophical treatment of Macroevolution for that site’s Archive, but this guy, Douglas Theobald, being a scientist, had to attend to actual evidence, and so he wrote the excellent “29 Evidences for Macroevolution” FAQ. It’s a pretty solid piece of work.

Now a biochemist at Brandeis University, Doug has published a full test of the hypothesis of Universal Common Ancestry (UCA) in no less than Nature. And I get thanked in the Acknowledgements. A philosopher has arrived when he gets mentioned in Nature, but for a scientist to be published there is the Holy Grail. I am glad to have had a (very) minor role in this. Nick Matzke, who also commented on the draft, has a roundup here at Panda’s Thumb.

It might be thought that the target here is creationism, and so it is taken by at least one “baraminologist” (a made-up term for creationist “taxonomy”), but actually it is a test of competing hypotheses in actual science, such as the claim made a lot lately, for example by Carl Woese and Mark Ragan among others, that the treelike structure of evolution is broken by lateral genetic transfer. I’m not competent to evaluate his method, although Mike Steel and David Penny certainly are, but I am pleased to see that he analysed his data sets without presuming genealogical relationships are implied by similarity of genetic sequences. Why?

This is a matter of epistemology. If we presume that we already know that similarity implies genealogical relationship, then we will require some foundation for that claim, and if we simply assert that it is part of evolutionary theory, then we really will have done what the creationists accuse us of, and circularly defined our evidence. Doug’s FAQ is an attempt to show the open minded this is not a matter of defining the solution into existence. This paper aims to show that UCA is attested by the data without circular arguments. I mention this because in Elliot Sober’s recent book he reprises his “modus Darwin” claim that similarity implies common ancestry; which even on orthodox evolutionary theory it does not (due to convergence). Homology implies common ancestry, not mere similarity.

Molecular sequences are especially problematic with respect to convergence because the number of states a sequence can take are much lower than the number of, say, protein states, cell states, and especially developmental states, so convergent similarity is harder to distinguish from homological similarity just on sequence alone. Terms like orthology and paralogy indicate the special difficulties in molecular sequences, let alone the host of other terms invented just for that field. You may be able to identify, say, a bone in a skull as being homologous between a bird and a human because of when and how it develops; this is not possible in molecular sequencing.

So I greatly appreciate that Doug has taken a “neutral” stance here. I think I like his likelihoodist approach, but I’m not au fait with that stuff enough to know. I do however think that he has taken the right epistemic approach.

The nice thing is that the treelikeness of phylogeny is not swamped by lateral transfer in his results. Reticulation (or networking) in the trees is not enough to show that there was no tree. In fact I think this is going to be the case almost inevitably, because if lateral transfer did swamp the trees, we would still see treelike structure at different levels, since a tree is simply a representation of data structure – if you can differentiate taxa in the data, then you automatically have a tree (and a set of nested Venn Diagrams, and a list of indented taxon names, etc.). A data set that showed almost no structure would not have a treelike representation. Moreover, in order to identify actual lateral transfer, you need first to have a tree to compare it to.

But there are presumably statistical tests that can be done with minimal assumptions, and this looks like what Doug has tried to do, to avoid begging the question. It’s a good paper. In particular it is a useful tonic to the stream of scientific claims that the tree is actually unrooted, or that lateral transfer has killed off tree thinking, etc. Darwin remains triumphal…

11 Comments

Filed under Creationism and Intelligent Design, Epistemology, Evolution, Genetics, Philosophy, Science

11 Responses to Testing universal common ancestry

  1. Nice! Although I disagree with most of this bit ;-)…

    ===============
    This paper aims to show that UCA is attested by the data without circular arguments. I mention this because in Elliot Sober’s recent book he reprises his “modus Darwin” claim that similarity implies common ancestry; which even on orthodox evolutionary theory it does not (due to convergence). Homology implies common ancestry, not mere similarity.

    Molecular sequences are especially problematic with respect to convergence because the number of states a sequence can take are much lower than the number of, say, protein states, cell states, and especially developmental states, so convergent similarity is harder to distinguish from homological similarity just on sequence alone. Terms like orthology and paralogy indicate the special difficulties in molecular sequences, let alone the host of other terms invented just for that field. You may be able to identify, say, a bone in a skull as being homologous between a bird and a human because of when and how it develops; this is not possible in molecular sequencing.
    ===============

    My objections:

    1. I don’t think Doug contradicts Sober here, rather he makes the same argument.

    Sober would say we should compare:

    likelihood of some data (i.e. a sequence alignment) given common ancestry vs. separate ancestry

    i.e.

    A = prob(data|common ancestry)
    vs.
    B = prob(data|separate ancestry)

    …and this is just what Doug did, for reals with maths. And Sober says that “modus Darwin” is just the argument that A >>> B, for any reasonably complex set of character states and any reasonably high degree of similarity.

    2. “Homology implies common ancestry, not mere similarity.”

    Depends on definitions. Some (many) would say homology is defined as similarity due to shared ancestry. Thus you’ve already concluded in favor of common ancestry when you say two things are homologous.

    Others would say that you could define “homologies” as characters where your initial assessment is that it is more likely that the character is shared due to common ancestry than due to separate ancestry. Then you revise your assessment after your phylogenetic analysis.

    Or there are other ways. Yours: “You may be able to identify, say, a bone in a skull as being homologous between a bird and a human because of when and how it develops; this is not possible in molecular sequencing.”

    …is a classical one and not always satisfactory (there are apparently organs like eyes that develop differently but are definitionally homologous because the common ancestor had eyes; developmental characters can change in evolution, just like everything else), although it often is. To translate it into likelihood, basically considering development etc. makes the character more complex, meaning it is less likely to have its shared features originate independently, as opposed to having a common source (common ancestry).

    3. “this is not possible in molecular sequencing.”

    Well, one can look at the homology of gene order, gene networks, protein-protein interactions, etc…

    4. “Molecular sequences are especially problematic with respect to convergence because the number of states a sequence can take are much lower than the number of, say, protein states,”

    This only applies at the level of one or a few nucleotides. At the level of a gene or genome the number of possible states is enormous and the probability of getting strong similarity independently is vanishingly low.

    5. “Terms like orthology and paralogy”

    Paralogy is no more problematic than serial homology in morphology…they are basically the same thing in fact (within-organism duplicates)…

    Apologies, I just spent a semester listening to Brent Mishler…again…

    • John S. Wilkins

      So I need to commit argumentum ad Mishler, I see… Brent will possibly never buy me coffee again.

      So I think that mere similarity is insufficient to establish common ancestry without a host of other ancillary assumptions, such as the likelihood that the similarity could have arisen by convergence. This is enough to test between two competing common ancestry histories, perhaps, but not to test the existence of common ancestry simpliciter. Nor is it enough to test a history in the absence of prior knowledge of the process of evolution, which is why it is circular.

      However, I think that homology is not a theoretical construct; we do not identify homology because we know it is inherited similarity; we identify homology and then explain it by shared ancestry. Owen had no hypothesis of common ancestry when he identified homology as a category, nor did Belon when he listed homologous bones between birds and humans in the 16th century.

      To assert that homology is “similarity due to shared ancestry” is both historically, and I think heuristically, false. It conflates the explanation with the explanans. Homology is also the diagnostic criterion for shared ancestry, so if you define it the way you (and Brent, and many others) just did, you are being entirely question begging.

      How do you classically identify homology? By sequence in development, position in the organism, and by morphology. To identify homologies in molecular biology is not nearly so atheoretical a process, and requires a certain amount of prior knowledge, but I grant this is a matter of degree not kind. And as the sequence increases, you do get a more improbably outcome, but you also get a less unitary sequence. The genome, by the way, is not a homolog; only small segments of it are. At some point you have to tell whether the similarity is a gene, a pseudogene, or a random stretch of convergent or accidental similarities.

      I am not saying it can’t be done, but that the issue in molecular systematics is not so clear cut as often asserted. Moreover, molecular sequences are just another kind of identifiable trait; the main difference is that phylogenetics done with molecules is a few orders of magnitude larger than the character states used in “traditional” phylogenetics, that’s all. That introduces complexities of reasoning that I think are more than merely statistical.

      • I have to agree with Nick here. I think Sober’s argument, at least his latest exposition in his book _Evidence and Evolution_ (Ch 4), is much more nuanced than John is making out. Sober doesn’t claim that similarity implies common ancestry across the board — but he recognizes that there *is* something about similarity that is evidence (maybe strong, maybe very weak, depending on the specifics) for common ancestry. All homology inferences boil down, in the end, to looking at biological similarities of some sort or another. But Sober considers the fact, from a likelihoodist POV, that similar structures can be generated from processes other than common ancestry. He warns that we should compare the likelihoods of competing hypotheses/mechanisms. He doesn’t really go into the fact that, say, selection can result in similarities too, so maybe John’s criticism is one of emphasis — Sober should explore the other competing options better.

        There is a sentence I originally had in my paper (in the conclusion), that I ended up omitting because it didn’t really fit in the flow anywhere (and seemed so obvious that I didn’t think it warranted being in the conclusion):

        “While the models described here do not assume *a priori* that significant sequence similarity implies homology, the strong results from these tests provide a firm logical basis for relying on this inference as a general principle.”

        Nick wrote: “Depends on definitions. Some (many) would say homology is defined as similarity due to shared ancestry. ”

        Actually, I think most would define homology as characters shared by common ancestry, really regardless of the level of similarity. Consider two modern proteins, that have no detectable sequence similarity, and perhaps have even different structural folds (no obvious structural similarity), yet are in fact descendants from an original single protein precursor. Different mutations in different lineages have gradually erased all sequence memory and altered the conformations of the proteins bit by bit until they are not detectably similar. These two proteins are nevertheless homologous. The problem is epistemological — we will be hard pressed to ever be able to tell that they are truly homologous, but that’s different from the fact-of-the-matter.

        So, IMV, homology *is* common ancestry of a sort. And homology does not necessarily imply similarity, though it may make it more probable, esp. for short time frames and low evolutionary rates.

        I am of course using the modern, evolutionary re-definition of homology. I’m not using Owen’s definition, or Geoffroy’s (who should really have priority over Owen here), which are

      • Something got clipped:

        I am of course using the modern, evolutionary re-definition of homology. I’m not using Owen’s definition, or Geoffroy’s (who should really have priority over Owen here), which are interesting for historical/philosophical reasons but have been supplanted. Homology means something different now.

      • John S. Wilkins

        No, it doesn’t, I think. It is explained differently now, but it remains a phenomenon that is observed. I refer you to

        Brigandt, Ingo. 2003. Homology in comparative, molecular, and evolutionary developmental biology: The radiation of a concept. Journal of Experimental Zoology Part B: Molecular and Developmental Evolution 299B (1):9-17.

        Brigandt, Ingo. 2007. Typology now: homology and developmental constraints explain evolvability. Biology and Philosophy 22 (5):709-725.

        Griffiths, Paul E. 2006. Function, Homology, and Character Individuation. Philosophy of Science 73 (1):1-25.

        Griffiths, Paul E. 2007. The phenomena of homology. Biology and Philosophy 22 (5):643-658.

        Laubichler, Manfred D. 2000. Homology in Development and the Development of the Homology Concept. Amer. Zool. 40 (5):777-788.

        Keller, Roberto A., Richard N. Boyd, and Quentin D. Wheeler. 2003. The Illogical Basis of Phylogenetic Nomenclature. The Botanical Review 69 (1):93–110.

        Wagner, Günter P., and Peter F. Stadler. 2003. Quasi-Independence, Homology and the Unity of Type: A Topological Theory of Characters. Journal of Theoretical Biology 220 (4):505-527.

        There is a distinction made by some of “taxic homology” as opposed to developmental or phenomenal homology, but I think that is an empty distinction made solely to defend the view that homologies “are” inherited similarities. There are simply homologs, and they get their explanations from shared ancestry, phylogenetic conservation and monophyly.

        But this is an epistemic matter, and we either can identify homologs without theory, as Geoffroy (who called them analogues, by the way) and Owen did, or we can identify them using theory, in which case they cannot support the theory itself without circularity, either virtuous or vicious, or we have something like Hennig’s “reciprocal illumination”. I plump for the last option – phenomenal homologs set up the explananda for the evolutionary explanans, which in turn causes us to refine our evolutionary story to account for anomalies, etc. It’s a dance, because we never start off ignorant and theory free, and we never attain pure theory either.

        So I think that assuming in your models that a similarity is due to homology and not homoplasy is question begging if that homolog is the support for a claim of monophyly and ancestry; you first have to independently, or as independently as it is possible to do, test that hypothesis before it is used as evidence to test another hypothesis (ancestry). Typically this means doing what Darwin noted – remove all useful traits if you can, as they are of no service to classification. Then you have set up the problem to be explained by evolution, and not before. And modern evolutionary theory doesn’t escape that necessity by simply defining the phenomenon of homology.

  2. John says,

    …actually it is a test of competing hypotheses in actual science, such as the claim made a lot lately, for example by Carl Woese and Mark Ragan among others, that the treelike structure of evolution is broken by lateral genetic transfer.

    I’m not competent to understand Doug’s work but I didn’t think it was addressing the “web of life” issue.

    Neither did Steel and Penny ’cause they say,

    Theobald’s analysis is definitely not an argument for a ‘tree of life’ in place of a reticulate network that shows extensive lateral gene transfer, particularly in early life and in bacteria and archaea. Indeed, Theobald considers networks and 9 of the 22 proteins he analyses are thought to have undergone horizontal transfer early in evolution

    Perhaps Doug could clarify this point? Does his analysis refute Doolittle’s idea of a web of life?

    • Steel and Penny are correct. My analyses don’t really directly address whether a tree of life (TOL) or web of life (WOL) is the best model. In fact, my class II UCA hypothesis (universal common ancestry + allowing a different tree for each protein) is the best of the lot, by quite a bit. You could, perhaps, interpret that as supporting a web, but you have to be careful, since different gene trees can result from things other than HGT and fusion events.

      There’s also the issue of what is meant when somebody claims that the TOL hypothesis is invalid/false. What tree do they mean? A tree of species? Tree of organisms? Tree of genomes? Tree of genes? Do conflicting gene phylogenies actually contradict the first two? This has not yet been worked out satisfactorily in the lit.

      The way I see it, I’m not testing TOL vs WOL, but rather whether having a WOL diminishes or eliminates the support for common ancestry, as some well-known biologists have suggested.

  3. I think homology is yet another of those words whose meaning has evolved (albeit subtly) over time.

    You’re quite right in saying that Owen had no hypothesis of common ancestry when he identified homology – and that shared ancestry explained [note my change to past tense] homology. But, now that we have overwhelming evidence that evolution happens, I don’t think it’s unreasonable to redefine homology as similarity due to shared ancestry. Indeed, I would argue that it is useful to define it that way – although I appreciate that it leaves us open to the tired, old circular argument accusation.

    In an ideal world, we might invent a new word meaning ‘corresponding features in organisms sharing common descent’ (or something like that), but why bother when we already have ‘homology’?

  4. I am not hugely committed on the definitions issue. In my opinion, it is usually better to just find out what various people/schools of thought mean by words like “homology”, than to battle to the end to say that everyone has to mean the same thing.

    I’ve got an essay in me on this topic, but basically, a lot of the differences in how different people/groups use the term “homology” depend on their research goals.

    1. If you are a cladist/phylogeneticist, your goal is get as much good data (i.e. characters with states coded as 0s, 1s, etc.) as possible, for input into a phylogenetic analysis. This requires you to make an initial hypothesis of homology, i.e. characters that you think are more likely to be shared because common ancestry than not. You may be wrong 40% of the time, but this doesn’t matter for the phylogeny, because if you have enough characters, and the majority have phylogenetic signal, the tree best supported by the characters will be close to the true phylogeny.

    Phylogeneticists don’t like defining homology as “shared position/development/whatever”, because in their experience these criteria sometimes (not usually, but sometimes) apply to what turn out to be homoplasies in the phylogeny.

    (An aside: it is not true as a general matter that it is a good idea to “remove all useful traits if you can, as they are of no service to classification.” As Sober points out, selected traits can be good evidence for common ancestry (good homologies) as long as there are many possible organizations of the character. E.g., there are many ways to build camera eyes, so the unique organization of the vertebrate eye is a good shared, homologous character indicating common ancestry of the vertebrates).

    2. If you are an anatomist like Owen, or another descriptive biologist, or developmental biologist, you are mostly interested in figuring out the organizational patterns underlying organismal form. Homology basically becomes a way of organizing your description, and of applying the same terminology to different organisms, instead of inventing terminology anew for each new critter. And it helps in experiments — e.g., if you know that hox gene X triggers developmental cascade Y to form homologous structure Z across bilaterians, then you have a research target when you study some new species (and eventually you find exceptions, where Y is triggered by W instead of X, or whatever).

    Anyway, for these purposes, the positional/developmental criterion may well be adequate.

    3. If you are Darwin, or a modern internet warrior battling the creationists, you are not really trying to build a detailed phylogeny in #1. So you don’t care about every last little bump and nob of bone than a phylogeneticist might want to code up and put in their data matrix — simple traits like these might well have some statistical signal useful to the phylogeneticist, even though simple characters like this have a good chance of exhibiting homoplasy.

    So if instead of building a detailed phylogeny, your main interest is in making a general argument for the mere fact of common ancestry of a few big obvious groups, i.e. mammals or vertebrates, then you ignore the bumps on bones and just focus on big complex characters that share a whole bunch of details between organisms (structure, position, development, etc.), and where these shared similarities seem to be functionally unnecessary (which is particularly clear when the two homologous structures have much different functions, e.g. vertebrate forelimbs). Then you argue — convincingly — that getting all these details the same is wildly unlikely through chance similarity or through functional necessity, and thus there has to be some other explanation, namely common ancestry.

    So a lot of the differences boil down to goals and the level of detail. For some things — the vertebrate forelimb, for instance — there are so many detailed similarities across so many divergent functions that all the schools will conclude the same thing. But if, like a cladist, you atomize forelimbs into 150 binary 1/0 characters, you will (a) make it possible to build a detailed phylogeny, but (b) also inevitably discover that some portion of the atomized characters (or more likely, character states) will evolve multiple times on the phylogeny, i.e. be “homoplasies” instead of “homologies”. This doesn’t contradict the fact that the forelimb in general is overwhelming evidence for common ancestry. And it doesn’t contradict the fact that we can say that we think forelimbs are homologous because we have good evidence they derive from a common ancestor. All it means is that people with different research interests are using the concept at slightly different levels of analysis, and at different parts of the chain between raw observations (characters) and confirmed theory (common ancestry).

    IMHO at least… ;-)

    • John S. Wilkins

      I’m preparing to travel, so I can’t respond in detail now, but Nick wrote this:

      An aside: it is not true as a general matter that it is a good idea to “remove all useful traits if you can, as they are of no service to classification.” As Sober points out, selected traits can be good evidence for common ancestry (good homologies) as long as there are many possible organizations of the character. E.g., there are many ways to build camera eyes, so the unique organization of the vertebrate eye is a good shared, homologous character indicating common ancestry of the vertebrates.

      In so doing you are eliminating the useful aspects of the eye (i.e., having a lens is not the character you use) and using the contingent aspects (the crystalline proteins differ between cephalopods and verts, but that makes no overall difference to the utility of the lens) to diagnose phylogeny. This is exactly what Darwin meant: not that the organs these characters are included in are to be inutile, since there are few of those if any, but that the characters are.

  5. Michael Fugate

    I agree with John on Hennig’s reciprocal illumination. We should look at putative

Leave a Reply