Saturday, March 17, 2012

Gorilla my dreams, I adore you

What to do about geneticists?  On one hand, they are so smart that we should accept whatever they say, no matter how absurd, inaccurate, or even racist it may be.  (See Nicholas Wade’s Before the Dawn).[1]   On the other hand, they’re ignorant and arrogant assholes and they should be thrown in jail (See Trofim Lysenko).[2]  There has got to be a middle ground.

The gorilla genome is now out, and when combined with human, chimpanzee, and orangutan, it allows us to do a phylogenetic comparison.[3]  We have known since the 1980s that human-chimp-gorilla genetically is a very close call, with DNA tending to place humans and chimps a little closer, but only with a lot of discordance or statistical noise.  (That is in fact exactly what the ill-fated DNA hybridization showed, although it was infamously misrepresented.)  When the mtDNA data first came out [4] they linked human to chimp pairwise, but only if you ignored the fact that over half of the phylogenetically informative DNA sites did not in fact show it to be human-chimp.   Those data showed it to be chimp-gorilla and human-gorilla.  The only way to extract human-chimp from those data was to treat the question like a Republican primary, where whoever gets the plurality of the votes wins the state.  So human-chimp was Mitt Romney, winning the nomination, but with barely 45% of the phylogenetically informative sites.

It then becomes a trivial task to explain away the discordant data, that is to say, the 55% of your data that you have decided is giving you the “wrong” answers.   You say it is “incomplete lineage sorting” or the result of ancestral polymorphisms, which have segregated into descendant taxa in a pattern different from the sequence of speciation.  Geneticists illustrate this with images that always seem to remind me of maps of the London Underground, with chimpanzees being Bakerloo and humans Victoria Station.

But I digress. It might also be parallel mutation or even backcrossing.  The problem, though, is that you have a lot of  homoplasy, and one of the assumptions of cladistic/phylogenetic analysis is that homoplasy (i.e., observed as discordance) is very, very low compared to synapomorphy (i.e., the shared derived characters that you think are tracking the actual branching history of the species).

This is the equivalent of simply choosing the most parsimonious solution to the phylogenetic problem.  Most of the data that give a pairwise resolution give this pairwise resolution, therefore it must be the right one.  But there is an inherent contradiction in this logic.  You are choosing the most parsimonious solution in a system that is not obviously very parsimonious.  In other words, if you are willing to accept the possibility that 55% of your phylogenetically informative sites are homoplasies (that is to say, are giving you the “wrong” answer), then how can you reject the idea that 70% of your sites might be giving you the “wrong” answer?  I talked about this many years ago in the American Journal of Physical Anthropology.[5] 

The model that fits the data best is not a model of two successive bifurcations, but what we called at the time a “trichotomy” and now would call “reticulate” or even “rhizotic” evolution.[6] [7]

The geneticists working on this problem have been hampered by the cladistic necessity of regarding speciation as events, rather than as processes – when their ape data are showing speciation as processes, not as events.  The new paper on the gorilla genome says that 30% of their phylogenetically informative sites are discordant.  This is how the new paper imagines the genomic relationships of humans, chimps, and gorillas – as indicating two temporally isolated speciation “events” and whatever the hell is going on in the middle there.

The creationists jumped all over this inconsistency, and it really is just the result of sloppy thinking by the scientists.

In trying to plug the genomic data into sequential speciation events, we are committing the square-peg-round-hole fallacy. There are historical and ideological reasons for depicting it as two successive, temporally distinct “events,” but that certainly misrepresents the evidence, and most likely misrepresents the biological history.  One of the most bizarre illustrations was in a recent introductory textbook, which showed this to students:

It’s trying to say that there were two speciation events, 7 mya and 6 mya, but has located the 7 mya event incorrectly.  If you look at the scale, you’ll see that it’s actually drawn at 8 million, to put a separation between them that shouldn’t be there.  The same text draws it this way a bit later. with very little (vertical) time separating the two “events” at 7-8 mya and 5-7 mya, but a lot of (horizontal) space.  That ought to learn ‘em!

Obviously, that’s not the text I use. 

The new paper on the gorilla genome, I might add, sets the “speciation events” at 6.0 and 3.7 mya.  The 3.7 mya date for the divergence of human and chimpanzee is simply, to the extent that anything can be falsified in the fossil record, false - although it is oddly congruent with some of Vince Sarich and Allan Wilson’s early writings on the subject in the 1960s.[8]  The (myriad) authors of the new paper go on to argue that they can juggle some of the parameters in their computer program to make the dates come out to about 6 and 10 million years ago – as if that is supposed to give us confidence!

For the Alternative Introduction, I drew this figure to illustrate the problem.

Rather than prurient talk about cross-species buggery on the part of early hominids, how about speciation here as a temporal process, and populations through time as anastemosing capillary systems (Earnest Hooton’s metaphor, expressing the same point as rhizomatic and reticulate evolution).  It is also noteworthy that we tend to model and depict the gene pools of all three species as equivalent, when we’ve known for years that chimps and gorillas, even as relict populations, have gene pools that are considerably more extensive than that of our own species.  That is to say, Homo sapiens is relatively depauperate in genetic diversity.  The only study to try and incorporate that information into a phylogenetic analysis, many years ago, found that it completely obscured the phylogenetic “signal” and that it was therefore a fool’s errand to try and extract two successive bifurcations from a genomic analysis of human, chimpanzee, and gorilla.[9] 

Interestingly, the new paper actually did look at diversity in gorilla genomes, but didn’t incorporate that into their phylogenetic analysis.  Bottom line:  Human evolution is probably more interesting than the geneticists realize.

  1. I'm the lead author on the gorilla paper; thanks for blogging about our research. This comment is late, but someone only today pointed me at your post.

    I once heard an anthropologist complain that the stuff geneticists come up with is either completely irrelevant or else obviously wrong! I'm sure that is often true. Actually I think the two fields do complement each other well, but they are different, and there has been a lot of misunderstanding between them in the past, particularly with regard to the timescale of human evolution. One of our goals in this paper was to try and write a discussion which was accessible to anthropologists as well as geneticists, and I consulted with quite a few anthropologists in developing the figures and text, with that goal in mind. But clearly there will always be some misunderstanding.

    There are some issues in particular which I think are worth clarifying. Firstly, the issue of what is meant by a speciation 'event'. One of the ways we study genetic data is to interpret it in light of a model of how individuals in different species are related, such as the one illustrated in Fig. 1a. In order to be tractable, these models are quite likely to be simplifications of what actually happened. That doesn't make them irrelevant - or 'sloppy thinking'! It just means that in interpreting the results, we then also have to consider what they represent in the context of more realistic ancestral demography, e.g. when speciation is an extended process, or populations have substructure. That's why we talk about things like effective population size and effective speciation time. We are of course aware that speciation is not necessarily a single clean event, and actually our paper considers that explicitly. Not just in the supplement (which I appreciate you probably didn't have time to read - although I think you might find it useful to look at the relevant discussion there), but also in the main text, particularly the last two paragraphs of the section on great ape speciation. I think your bottom line, in this instance at least, is unwarranted.

    Having said all that, it is important to realise that there is in fact no clear evidence that the human-chimp and human-chimp-gorilla speciations were not clean allopatric splits! In particular, the observation of incomplete lineage sorting (ILS), a well understood population genetic phenomenon, is not evidence for reticulate evolution. Even if the history of these species was as clean and simple as fig 1a, we would still see ILS at a level dependent on the relative timescales of speciation and genetic coalescence.

    There may sooner or later be other genetic and/or paleoanthropological analyses which do rule out the simple 'sequential speciation event' model, but for the time being it is simply incorrect to say that it misrepresents the evidence. It may be at odds with our strongly held intuitions about speciation, but as scientists we should follow where the data takes us, not vice versa.

  3. Many people skeptical of evolution actually do understand the mechanism of incomplete lineage sorting and how it is invoked to explain incongruent data.

    What evolutionists don't understand is that they are using ILS as a perpetual rescue device that will save them from any potential falsification.

    Any time the data does not go your way, you can invoke ILS, and defer to mysterious, untestable, imaginary past speciation and coalescence events.

    Granted, there are many evolution-skeptics who do not yet understand you're doing this. But eventually it will be rightly revealed as yet another untestable, ad-hoc hypothesis for the purpose of constantly avoiding falsifiability.

