Phylogeny simulation 101

Species across a landscape, as simulate by Boucher et al. It's this pretty stained-glass-esque diagram that inspired me to do this!

Species across a landscape, as simulate by Boucher et al. It’s this pretty stained-glass-esque diagram that inspired me to do this!

Lots of people have written code to simulate phylogenies, and yet more have written code that simulate traits across phylogenies. I’m not claiming any great novelty in what I’ve just done, but simulates a phylogeny under given speciation and extinction rates, and simulates a phylogeny, and a trait that affects speciation and extinction rate.

What I like about this is that the dependence on trait values is actually a dependence on what values of that trait other species have – in other words, it’s a niche-packing kind of model. Again, I’m not claiming any great novelty in these sorts of models (read this excellent paper, for example), but this was my first stab in what I hope will be a long stream of work. This is very raw code (a quick skim of it makes me realise a refactor would more than halve its length) but it does get the job done (I hope!).

My first impression is that this stuff isn’t very hard, so if you have any interest you should definitely give it a go. Moreover, the insight I gained from it was quite important – the shape and size of a phylogeny changes a great deal over different simulations, and while on some level I knew this it was only while checking to see if I’d messed something up that I really gathered a true appreciation for it.

Next post… either a model incorporating communities/biogeography (–> some model of allopatric speciation), or a vague attempt at fitting estimting the parameters I set once the phylogeny/traits are calculated. I have a feeling those two will be much harder!…


Fibrous phylogenies – fiber.plot

Prototype 'fiber plot' of the phylogeny of all (well, most) mammals. Taken from Bininda-Emonds (2007).

Prototype ‘fiber plot’ of the phylogeny of all (well, most) mammals. Taken from Bininda-Emonds et al. (2007).

Visualising phylogenies is really, really hard. There’s been remarkably little progress since Darwin’s first printed figure, although in my opinion OneZoom is one of the greatest advances yet. I’ve always been skeptical of anyone who claims they can fully comprehend phylogenies of that size, let along larger. The blur of coloured splodges at the top of this post is a prototype of how I think large phylogenies can be visualised; I call it a ‘fiber plot’ because I view it as a series of cross-sections through a ‘fiber’ of species evolving along a phylogeny . It’s an animated GIF and it’s a megabyte, so it may take a few moments to load.

Each grid cell represents a species, and closely-related species are nearby one another in the grid. At the start of the animation each grid cell is the same colour, because the phylogeny starts along the root – the most recent common ancestor of the >5000 mammals in this animation. With each frame we move closer to the present (each frame is a million years as recorded at the top of the plot), and when a particular clade branches off all the species within that clade get a colour that represents that year (it’s just the ‘rainbow’ function applied to all the years). Those species keep that colour until there’s another branching event. Try it with your own phylogeny – the function is fiber.plot.

The problem with viewing phylogenies is information content and space: there’s a lot of wasted space in traditional drawings, and it’s hard to process all the information properly. One approach to this is to buy lots of big screens, another is to plot in 3D, and OneZoom uses fractals to help zoom through the information. I don’t think fiber plots are the best thing since sliced bread, but I do think they’re information-dense. There is very little wasted space in the plot (5000 species above), and by animating the image we can see the timing and magnitude of speciation across the whole tree quite easily.

A few obvious things to add spring to mind:

  • support for species/clades going extinct
  • better ways of positioning the species in space (I’ve hackily commented-out some PCA and clustering-based approaches in the function)
  • better layouts of species
  • the ability to select clades of interest
  • outlines around individual species (again, spot the hacky commenting of contour)

…but, in what I sense will be a common thread in this blog, why optimise when you can just move onto something else? :p Please let me know what you think!