Example meta-community phylogeny. Circles are proportional to the number of individuals at the end of the run. Look! The species that died out isn’t anywhere at the end! I used rainbow to make this look slightly more appealing; I admit it’s not the prettiest figure in history.
Following on from the last post, I wrote a quick function that (sort of) simulates individuals of species moving through a meta-community (sim.meta.comm), and then added something on top that simulates the evolution of those species (including generating a phylogeny; sim.meta.phy.comm). I also neatened up the creation of a phylogeny from what I’ve been calling an ‘edge matrix’ in my code (edge2phylo). The meta-community has different environmental conditions throughout it, and individuals can migrate at each time-step. There’s no competition in the model yet.
I expected this to be a nightmare, but as long as you use the steps that monitor the ecology of the species as a check for whether you need to do anything with the phylogeny, it actually turns out that simulating the phylogeny is easier when you’ve got ecological information than when you’re just doing the phylogeny alone. In this version, I don’t have any effect of species traits – that’s for next time.
The main conceptual point I took from this is the difficulty in deciding how to model stochasticity. It’s hard to get the environmental parameters on the same scale as the stochasticity and species abundance parameters, and as such I ended up doing everything with Poisson distributions which seemed rather strange to me. It’s quite worrying how easy it was to make environments where species all headed to extinction, apart from in very, very small patches of the environment before I got my head around it all.
Next time – competition and trait evolution within the same model! Then, and only then, will I move on to actually trying to estimate parameters of interest from all this…
Species across a landscape, as simulate by Boucher et al. It’s this pretty stained-glass-esque diagram that inspired me to do this!
Lots of people have written code to simulate phylogenies, and yet more have written code that simulate traits across phylogenies. I’m not claiming any great novelty in what I’ve just done, but sim.bd.tree simulates a phylogeny under given speciation and extinction rates, and sim.bd.tr.tree simulates a phylogeny, and a trait that affects speciation and extinction rate.
What I like about this is that the dependence on trait values is actually a dependence on what values of that trait other species have – in other words, it’s a niche-packing kind of model. Again, I’m not claiming any great novelty in these sorts of models (read this excellent paper, for example), but this was my first stab in what I hope will be a long stream of work. This is very raw code (a quick skim of it makes me realise a refactor would more than halve its length) but it does get the job done (I hope!).
My first impression is that this stuff isn’t very hard, so if you have any interest you should definitely give it a go. Moreover, the insight I gained from it was quite important – the shape and size of a phylogeny changes a great deal over different simulations, and while on some level I knew this it was only while checking to see if I’d messed something up that I really gathered a true appreciation for it.
Next post… either a model incorporating communities/biogeography (–> some model of allopatric speciation), or a vague attempt at fitting estimting the parameters I set once the phylogeny/traits are calculated. I have a feeling those two will be much harder!…
It’s very hard to find pictures of evolutionary transitions! Taken from transitional-fossil.com, which made me smile.
I’ve been thinking a little about transitions between discrete characters, in part because of the excellent corHMM package. corHMM is great for measuring the rate at which discrete characters transition from one to the other, but you can’t tell whether certain transitions tend to happen under certain conditions. For example, it’s one thing to know how frequently plants transition between dormant and non-dormant seeds, but another to know under what global temperatures that tends to happen.
In theory, that’s where transition.calc comes in. It takes a discrete (e.g., seed dormancy) and continuous (e.g., cold tolerance) character, reconstructs ancestral states and transitions, and then plots one against the other. I prefix all of this with “in theory” because:
- It’s a very, very dirty hack, and I’m certain there’s an analytical solution for this. Moreover, I’m not convinced that taking the modal transition that happens along a branch and working with that will actually work.
- I pulled the code together from some other scripts and I’m not convinced that it matches up the discrete and continuous reconstructions properly (seems important, right?…)
- I haven’t validated the script with simulated data. To do that, I need to write a flexible way of defining models of evolution across a phylogeny (hopefully, that’s next fortnight’s post…)
…however, I do have some hope that this works because of the scripts I pilfered it from (…which have proven remarkably hard to generalise, thank you very much Past Will). I found what appear to be significant results in a seed dataset that weren’t there when I randomly evolved two separate traits across the phylogeny. That’s not to say I haven’t introduced another error somewhere, but it’s a good sign. I’m currently having a debate with co-authors as to whether there’s a difference between this and just regressing the continuous variable against the discrete; I’d be grateful for feedback!
…and now for something completely different. The Monty Python foot, taken from ecogirlcosmoboy
When I first read about Julia, I dismissed it as a nice idea that was coming too late into an already crowded niche space of programming languages. Fast forward almost two years and Doug Bates (of mixed effects models fame) is using it, and even Wired is is talking about it. So I decided to give it a try by making a phylogenetic library; don’t get too excited because it only sort-of loads a Newick phylogeny and not much else.
My first impressions were very good. It is fast – my code is dreadful (I wrote it with a few beers) and even with recursion it’s usable. It’s also very readable; despite a few quirks, it’s probably easier to read than R. The features that impressed me (in vague order of increasing nerdiness) are:
- Easy to use multiple processors
- Package management is all linked into GitHub, so there’s no messing around with CRAN-like central repositories (…yes, that could also be bad)
- The typing system is well-done and saved me time. If I say a function only takes a phylogeny, it only takes a phylogeny, and is very vocal about it without me writing checking code
- Types are defined explicitly, and there aren’t three kinds of class (R…!) to mess around with
- Emacs (through ESS) already gives me a good coding environment. I couldn’t get JuliaStudio to work on my Ubuntu computer, but I’m not a fan of RStudio anyway so it was no biggie
That said, there are kinks, and while it’s only at version 0.2 some are surprising gives it’s over two years old. You’re going to find yourself on developer discussion pages quite quickly because the help files are still being built. You can’t delete or redefine variables or types, which is a nightmare when you’re experimenting. I’ve had a rather vocal falling out with Julia’s regular expression matching (matchall, not match, is your friend), the debugger hasn’t been touched in a while (…but it works), and there’s no standardisation of graphics yet. More fundamentally, Julia needs to take statistical formulae seriously; the expression notation is sort of alright, but not every function listens to it. I want to be able to plot(y ~ x), damnit!
Bottom line: it’s not ready yet, it made me grow a few grey hairs, and The Queen isn’t dead so long live R. However, take a quick look at any package up on GitHub, and you’ll find something you won’t see on any R packages – there’s no C code. I’m tired of people talking about how great it is that you can easily use C++ in R – I learnt R specifically because I wanted to move away from C++. If you have to solve your problem using another language, maybe you weren’t using the right language to begin with. I don’t think Julia will supplant R any time soon, but I am going to keep plugging away at phylogenies in it; I want to do some big simulations and I really don’t want to return to C++. I doubt I’m the only one.
Visualising phylogenies is really, really hard. There’s been remarkably little progress since Darwin’s first printed figure, although in my opinion OneZoom is one of the greatest advances yet. I’ve always been skeptical of anyone who claims they can fully comprehend phylogenies of that size, let along larger. The blur of coloured splodges at the top of this post is a prototype of how I think large phylogenies can be visualised; I call it a ‘fiber plot’ because I view it as a series of cross-sections through a ‘fiber’ of species evolving along a phylogeny . It’s an animated GIF and it’s a megabyte, so it may take a few moments to load.
Each grid cell represents a species, and closely-related species are nearby one another in the grid. At the start of the animation each grid cell is the same colour, because the phylogeny starts along the root – the most recent common ancestor of the >5000 mammals in this animation. With each frame we move closer to the present (each frame is a million years as recorded at the top of the plot), and when a particular clade branches off all the species within that clade get a colour that represents that year (it’s just the ‘rainbow’ function applied to all the years). Those species keep that colour until there’s another branching event. Try it with your own phylogeny – the function is fiber.plot.
The problem with viewing phylogenies is information content and space: there’s a lot of wasted space in traditional drawings, and it’s hard to process all the information properly. One approach to this is to buy lots of big screens, another is to plot in 3D, and OneZoom uses fractals to help zoom through the information. I don’t think fiber plots are the best thing since sliced bread, but I do think they’re information-dense. There is very little wasted space in the plot (5000 species above), and by animating the image we can see the timing and magnitude of speciation across the whole tree quite easily.
A few obvious things to add spring to mind:
- support for species/clades going extinct
- better ways of positioning the species in space (I’ve hackily commented-out some PCA and clustering-based approaches in the function)
- better layouts of species
- the ability to select clades of interest
- outlines around individual species (again, spot the hacky commenting of contour)
…but, in what I sense will be a common thread in this blog, why optimise when you can just move onto something else? :p Please let me know what you think!