Visualising phylogenies is really, really hard. There’s been remarkably little progress since Darwin’s first printed figure, although in my opinion OneZoom is one of the greatest advances yet. I’ve always been skeptical of anyone who claims they can fully comprehend phylogenies of that size, let along larger. The blur of coloured splodges at the top of this post is a prototype of how I think large phylogenies can be visualised; I call it a ‘fiber plot’ because I view it as a series of cross-sections through a ‘fiber’ of species evolving along a phylogeny . It’s an animated GIF and it’s a megabyte, so it may take a few moments to load.
Each grid cell represents a species, and closely-related species are nearby one another in the grid. At the start of the animation each grid cell is the same colour, because the phylogeny starts along the root – the most recent common ancestor of the >5000 mammals in this animation. With each frame we move closer to the present (each frame is a million years as recorded at the top of the plot), and when a particular clade branches off all the species within that clade get a colour that represents that year (it’s just the ‘rainbow’ function applied to all the years). Those species keep that colour until there’s another branching event. Try it with your own phylogeny – the function is fiber.plot.
The problem with viewing phylogenies is information content and space: there’s a lot of wasted space in traditional drawings, and it’s hard to process all the information properly. One approach to this is to buy lots of big screens, another is to plot in 3D, and OneZoom uses fractals to help zoom through the information. I don’t think fiber plots are the best thing since sliced bread, but I do think they’re information-dense. There is very little wasted space in the plot (5000 species above), and by animating the image we can see the timing and magnitude of speciation across the whole tree quite easily.
A few obvious things to add spring to mind:
- support for species/clades going extinct
- better ways of positioning the species in space (I’ve hackily commented-out some PCA and clustering-based approaches in the function)
- better layouts of species
- the ability to select clades of interest
- outlines around individual species (again, spot the hacky commenting of contour)
…but, in what I sense will be a common thread in this blog, why optimise when you can just move onto something else? :p Please let me know what you think!