Phylogeny and the history of language and culture 26 Aug 201226 Aug 2012 Increasingly, work is being done using the methods of phylogenetic systematics to uncover cultural and linguistic evolution. A leading lab on this work is Russell Gray’s lab at the University of Auckland in New Zealand. He and his collaborators have looked at the evolution of language, particularly Pacific languages, and other cultural trends (like canoe decoration) in evolutionary terms. Now they have published a paper in Science (Bouckaert et al. 2012), “Mapping the Origins and Expansion of the Indo-European Language Family”. The media of course published this under headlines like “English language originated in Turkey”, thereby demonstrating that journalists understand no evolutionary thinking as well as they understand no economics. Basically by using word forms of many extant and extinct Indoeuropean (IE) languages, and a Bayesian analysis, they established that IE originated in Anatolia, or the central regions of modern Turkey. This is not unlikely, for certain values of “began”. They locate the origination event around 9500 years ago, which is not long after agriculture began more or less in the same region. Previous hypotheses were that IE began around 5000 years ago in the central Asian steppes, along with the domestication of horses and the invention of the stirrup. However, how good is this thesis? The BBC article with the silly headline is actually pretty well sourced and written. They quote Prof. Petri Kallio from the University of Helsinki as saying that “Unlike archaeological radiocarbon dating based on the fixed rate of decay of the carbon-14 isotope, there is simply no fixed rate of decay of basic vocabulary, which would allow us to date ancestral proto-languages.” He remains skeptical. This is a matter of methodology and epistemology, and it goes to the very foundation of phylogenetic method itself. At best a sample of an organism or artefact from within C14 radioisotopic dating ranges only shows that an instance of that type was around at that time and place. It does not show whether it was the earliest or latest; it merely sets up a single anchor point that all hypotheses must account for. Likewise, a document or monument with written language shows only that a language type was there at a time. And since writing per se did not arise until around 3500 years ago, even that cannot help. All we know are where recorded languages are or were found. They are anchor points, but not fixed ones. These anchors can and do move. And Kallio is right: there are no fixed decay rates, or molecular clocks. In fact there aren’t such things in biology either. Molecular rates of change are not universal or constant, and inferences based upon them are at best hypotheses based hypotheses. Phylogeny is a tool, but what does it show? In my last post I noted that phylogenetic reconstructions show only relatedness. What they do not show, without some extensive ancillary assumptions, is how that relatedness arose. The increasing awareness of lateral transfer and hybridisation between taxonomic lineages indicates that there can be some complex histories even if the taxonomic relationships are treelike. NB: to head off the most common error about this, lateral transfer does not undercut the treelike structure either of evolution or phylogenetic diagrams. It makes them harder to detect, but a certain admixture can be accommodated in a standard tree classification. Of course, if the rate of lateral transfer approaches equality, then you no longer have separate taxa, and so that would count as a single lineage that temporarily separated like populations either side of a geographic barrier that are brought back into contact. In the case of sociocultural evolution, such lateral transfer is often assumed to be rife. This is thought by some to undercut the importance of phylogenetic method in cultural contexts. I want to argue that it doesn’t, but that the inferences from phylogeny are not so obviously historical as some seem to think. First of all if some lateral transfer is possible in biological contexts between “good species” (the term used by biologists when they know it’s a species but it doesn’t follow some set of strictures they think species must), then some must be permissible in sociocultural contexts to. A loan word in French from English doesn’t make English and French the same language, no matter what the Academie Française might think. In biology it is the entire shared developmental system, including the genome, that makes a species a species. In culture a tradition has more than just a few elemental objects; it has a functional structure, and in language a grammar. So phylogenetics can apply nicely in contexts where traditions (or species) are well behaved. If they are relatively stable (i.e., not so transitory that they cannot be tracked), and distinct (i.e., the rate of lateral transfer is not so high they aren’t recognisable traditions any more), then you can do a phylogenetic analysis of them. It’s not so surprising really. Willi Hennig, whose Phylogenetic Systematics (1966) set up modern phylogenetics, took some of his ideas out of the discipline of stemmatics, or tracking manuscripts by differences in transcription (Platnick and Cameron 1977, Atkinson and Gray 2005) However, there are limitations when using this to reconstruct history. For a start, suppose you have two manuscripts that differ. You cannot reconstruct the last version they share historically. Suppose you have three, and two agree mostly. Can you reconstruct the last version from that? Well the two copies that agree might be from a large copy centre, but the one that disagrees might be from a minor monastery that actually had better copying procedures and a more original version, and so on. These issues are well known to historians and biblical scholars, for example. Now consider the argument put forward by Bouckaert et al. They look at the frequencies of cognate words and conclude from their analysis that IE began in a particular location at a particular time. Such reconstructions rely on assumptions, like a relatively constant rate of diffusion in all directions. What if the language was blocked by a cohesive language and culture in one direction? What if one population into which it diffused was more conservatively structured? What if a small military power managed to spread through large territories? Each of those shifts the “weight” of the diffusion pattern and means we might think something other than the conclusion that Anatolia was the centre of origin. There are many contingencies and possibilities allowed just by a phylogeny, in culture and language as in biology. I am not denying the conclusions reached here. I think it likely (the use of Bayesian analysis here is significant) that Anatolia was indeed a centre of many cultural novelties. We certainly think that agriculture arose near or around there. But it doesn’t follow that because Anatolia is novel in one respect (farming) it is novel in another (language). We should avoid confirmation bias in science. In more general terms, what counts as evidence in any historical and evolutionary process? Can we say that passerine birds first evolved in Austronesia? Can we say that writing began once and was diffused or whether there were many independent inventions? Where did the Etruscans come from? Can we make any origin claims at all? We certainly would like to. The trouble is that information gets lost over time, and the best we can do is anchor events based on actual data. All process hypotheses based on these anchoring events are at best consistent with the data, not proven or even necessarily made more likely by them (to avoid confirmation bias and affirming the consequent style inferences when unwarranted). It may sound like I am being contrarian here. I am not. This is the standard view in palaeontology (see for example Smith 1994), for example. History is hard to find, and we never have much confidence in our extensions beyond the data. It might be that we can reasonably think IE arose in Anatolia; knowing that is a lot harder. References Atkinson, Quentin D., and Russell D. Gray. 2005. Curious Parallels and Curious Connections—Phylogenetic Thinking in Biology and Historical Linguistics. Systematic Biology 54 (4):513-526. Bouckaert, Remco, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard, and Quentin D. Atkinson. 2012. Mapping the Origins and Expansion of the Indo-European Language Family. Science 337 (6097):957-960. Hennig, Willi. 1966. Phylogenetic systematics. Translated by D. D. Davis and R. Zangerl. Urbana: University of Illinois Press. Platnick, Norman I., and H. Don Cameron. 1977. Cladistic Methods in Textual, Linguistic, and Phylogenetic Analysis. Systematic Biology 26 (4):380-385. Smith, Andrew B. 1994. Systematics and the fossil record: documenting evolutionary patterns. Oxford, OX; Cambridge, Mass., USA: Blackwell Science. Epistemology Evolution History Natural Classification Social evolution Species and systematics Systematics
Evolution Philosophy and evolution 19 May 2009 Over the past 50 years or so, there have been many attempts to give a general metaphysics of evolution, ranging from axiomatisation (by Mary Williams, at the height of the “theories are axiomatic systems” period*), to “logical necessity” cases (such as Lewontin’s three conditions for natural selection), to “units of selection” arguments, most closely associated with George Williams and RIchard Dawkins. In each of these, and other, attempts, there has always been the presumption that there is a fixed hierarchy of ranks and units in biology. These are the “forms” of biology: replicators, interactors, species, genes, cells, and so on. The odd thing about this is that as people were asserting that essentialism is dead (see the article on species linked above), they were being essentialists about concepts and units and ranks. Ernst Mayr, for example, who asserted that species individually (the species taxon, as he put it) have no essences, nevertheless asserted that the concept of species (the species category) did so. He was an essentialist about the species concept. Likewise, the gene centrism of a Dawkins is essentialist about the replicator concept. And so on. Now one of the reasons why people adopted the hard and fast categories is that they usually were specialists in groups, such as mammals, birds or insects, where these categories had a real purchase. This is often referred to, mostly by botanists, as the “fur and feathers” or “vertebrate” or just “animal” bias. But another is just that they were seeking what used to be called the Characteristica Universalis, or the most general universal and formal language for the domain in question. It is a general disposition of those in the west to do this (and despite suggestions to the contrary, I cannot see how one might apply the Eastern metaphysics fruitfully in the domain of science). It is a constant temptation to try to ground ideas in unchanging and agential categories. We like species because they do something. We like replicators because they are the ultimate doers. These categories apply in ways that make sense of both the world, and our need for constancy. Coherence is not gone. Until you stop focussing on the “obvious” cases, and start paying attention to as many as you can find. I have what I call the “esoteric method”: look for cases that don’t fit the current categories and then go look and see if that is more general than you might think. For example, in his 1942, Ernst Mayr referred to nonsexual organisms as “aberrant” when discussing the adequacy of his “new” “biological” species concept (122, 129). Today we know that not only are most organisms not sexual, which would mean most of them are not arrayed in species, but that the sexuality of species even in the small twig of the phylogenetic tree that is metazoans is not constant: many groups have either got hybridisation, or asexuality, or both. Nor is gene exchange confined to sexual species – between species gene flow is common, and even among asexuals lateral transfer is frequent. In fact the sort of species Mayr expected to exist are rare, except among some groups of vertebrates (oddly, the group Mayr studied, birds, often hybridise). Over the past 50 years these essentialistic categories have become harder and harder to support empirically, as we have learned of more and more exceptions. Some, such as John Dupré, have argued for a pluralism of conceptions in biology due to the polytypic nature of the instances to which these categories are applied. It’s just a brute fact of biology that none of these categories are universal, and so biologists must avail themselves of whatever conception works in a particular case (to make this more concrete: species are sexual isolates when that works, but in, say, bacteria, they are phenetic clusters or something else). Some years ago, I published an idea that I think might be the resolution to this (2003) in which I argued that species is like any other property of organisms, something that has evolved in its own way. The reason there is no universal notion of species for the same reason there is no universal notion of leg: species, like legs, are the outcome of evolution. In other words, these kinds themselves evolve. This applies also to other apparently universal aspects of biology: genes, or rather replicators, cells, individuals, and so on. It is not the case that, as Dupré thinks, that anything goes, but that there are evolved modalities, as I called them – ways of being whatever it is that we are trying to understand. This applies not only to the organisms and their traits, but to the kinds of organisms, and even to the kinds of kinds. Taxa, units, ranks, entities, systems – all these are evolved, and so to understand what it means to be, say, a bird species or a eukaryote gene, you need to understand the evolutionary relations of that group. Last year, Peter Godfrey Smith published an interesting book that argues that the sole precondition for a Darwinian perspective on the world is that there are populations. Because we are disposed to see biology in terms of agency, we want agents, but that is, PGS holds, a remnant of the oldthink of teleology that Darwinism replaced. I think he’s well on the right track, although he still thinks that this means we cannot have types or classes. I think that classes are merely local and evolved. We are in a reading group covering his book right now, so as we work through it, I’ll probably add some more. One thing I do want to say now, though, is that there is a prior problem knowing what a population is. For instance, to know that an ensemble of individuals form a population, you need, minimally, to show they are of the same species because you don’t get a population that spreads across two or more species, unless they are causally connected reproductively (in which case they might be classed as the same species anyway). Moreover, you already need to know the sort of object/organism that counts as an individual for that group in order to identify it as a population. This is not always so easy, in the case of colony organisms. While PGS is rightly arguing that there are no ranks or special units, only populations (which comprise individuals that have heredity and ecological differences, leading to evolution**), it seems to me that he still requires there to be some sort of types or equivalence classes, even if there are no universal kinds of types. In part, this is something that comes out of the death of the essentialism story: it is often assumed that if one abandons essentialism, one loses access to any kind of equivalence class in biology (i.e., natural kinds; we aren’t worried about conventional classes or functionally defined classes), and that is what PGS assumes too. But it is my view that biology always uses types, which are defined or rather ostended by identifying an exemplar and then looking for clusters of properties. This is what PGS says we should be doing, but he does not see these as types. I do. By finding these clusters of properties (and even more the underlying developmental traits and heredity), we are then able to determine what a population is, and what individuals are, by a process of iterative induction (start with a case that is presumably exemplary and then make inductive generalisations from that until they fail). What bothers people who think in terms, not of binaries as Chris Schoen suggested, but of absolute levels or entities that do not change, is that evolution leaves us gasping and dealing with vague boundaries, shifting kinds and so on. I feel for them, but it is really biology that does this, and always has. What really is novel about evolutionary thinking is that we know not only that the appearances change, but that the forms, and the forms of forms also change. However hard to come to grips with, we must. And the solution to this vagueness is phylogenetic thinking. If you know where a species or an organism is placed on an evolutionary network (allowing for the moment that the tree topology sometimes fails), then you know what sorts of sorts it will fall into, or if you find that it doesn’t, that sets up an interesting research project. More as it occurs to me. * Williams was a student of the originator of the Axiomatic Method for the sciences, Joseph H. Woodger. ** Evolution includes a lack of change by stabilising selection or developmental entrenchment (which I think may be a subset of the former). We need not presume that selection always causes change (but if there is a lack of change, I think we should presume that is due to selection). Read More
Ecology and Biodiversity Chris Humphries dies 6 Aug 2009 Influential botanist and conservation biologist Chris Humphries has died aged 62. Roberto Keller has more. Read More
Evolution Fun in Guelph 27 Jan 2009 If you happen to be near the University of Guelph, then not only is Massimo Pigliucci giving a talk there, but there’s this event by my friend and former colleague, Stefan Linquist: Read More
How high does the posterior probability have to get before we switch from “reasonably think” to “know”? 🙂
Oh, sure, ask a simple question, why don’t you? 95.67% precisely. When do grains of sand become a pile? How many hairs on one’s head make you not bald? Seriously, as you know but readers may not, there is a massive literature on this. It doesn’t materially affect the argument, though, because whatever counts as “knowledge” in a given discipline or community, phylogenies are not yet knowledge apart from the data points as measured. They summarise what is known and act as straight rules for inductive generalisations from those data points. They test hypotheses in whatever way data tests hypotheses. They are not hypotheses (in my view).
The origins of Proto-Indo European have undergone just about every kind of analysis Linguists over the last two centuries could think of. While perhaps this study will have some merit. Though I am only an interested in this as a layman, to me the Kurgan Hypothesis for the location of the PIE Urheimat is the conservation of so many PIE features in the languages of this region; in particular Lithuanian but also to a lesser degree in the Slavic branch. While, under the Kurgan model, Hittite is the earliest family to branch out from PIE, it is still less conservative than Lithuanian language is. The real problem here is just how far you can take the biological genetic model and fit it into linguistics. They are both most certainly evolutionary processes, but there are always dangers of taking things too far. But if we do consider the highest number of conserved PIE traits as being a good candidate for the closest language to the original “proto” language, then Lithuanian and its close (but extinct) relatives seem to fit best. That’s not to say that the earliest PIE speakers couldn’t have arisen in Anatolia and that there was a split from that point as some group moved northward, the more northerly group retaining a more archaic and anachronistic version of PIE (much as, for instance, Quebec French tends to have far more archaisms than Parisian French, despite Parisian French being in the French “Urheimat”). Still, that doesn’t explain other aspects that make the Kurgan hypothesis popular. The PIE roots tend much more towards a Pontic-Caspian/Eastern European environment than an Anatolian one. There are also at least some degree of affinity between the Uralic languages, with at least a few roots from the two proto languages seeming to be of common origin. Of course the data is scant, and it is quite possible that these common roots may be borrowings in one direction or another, but still there seems to be at least some reason to formulate a more distant Indo-European-Uralic mother tongue further back in time, and considering the distribution of Uralic languages, if such a hypothesis gains strength, it makes an Anatolian origin for PIE seem less likely.
There are several weaknesses in the article by Bouckaert et al that together might very well invalidate their conclusion. One weakness seems to make a rather common reasoning error: equating a basal branch with origin. Basal branching does not directly translate into origin, as any biologist should know – the platypus is not the origin of the placentals. Moreover, the point about basal and origin can be seen in Bouckaert et al’s supplementary information that gives the large phylogenetic tree. We know Latin is the origin of the Romance languages, and Latin clusters basal to Romance languages – OK. But Gothic is not the origin of the other Germanic languages, just clusters basal. And Romani, the Gypsy language, clusters basal to all other extant Indian languages; it won’t be the origin of those languages. The phylogenetic tree is fully compatible with Hittite the earliest to branch out from PIE (Aaron Clausen said above), rather than Hittite representing the origin of PIE. Moreover, the shape of the phylogenetic tree in fig 2 of the article makes no sense if one considers an Anatolian homeland. Apart from a few non-significant early branches with the known problem languages, the main split in Bouckaert et al is between Indo-Iranian and a Western group. Look at the map in fig 2. That map presents two possibilities: Indo-Iranian departed first from Anatolia, or the Western branch departed first from Anatolia. In both cases the branch that first departed should cluster more basal, and the branch that left last should cluster with Anatolian! The phylogenetic tree (as presented fig 2) says IE should depart from Anatolia and after that split (where? Kurgan area?) into Indo-Iranian and Western language group. Another weakness is the reliance on written languages. We know the historic Skythians spoke an IE language (of the Iranian branch it is thought), but as we only know Skythian from a few words recorded by Greek authors, Skythian is absent in Bouckaert’s database. The area of Skythian is represented in Bouckaert’s figure S4 only by the late invasion of Slavonic languages. Given that Skythian is in the Kurgan area, the absence of early IE branches in the Kurgan area in Bouckaert’s modeling is not a result but a clear artifact.