Modelling Communities as DNA (in C++, sorry again!)

Inachis io © Eddie John (from UKBMS website)

Inachis io © Eddie John (from UKBMS website)

A while ago I was lucky enough to spend a few months in the Exelixis lab in Heidelberg, with rather grand hopes of modelling turnover in species composition as if they were nucleotides in a DNA molecule. C++ (sorry, not R, I know…) code implementing this method is up online here (to build you will need the Boost library, and ‘make install’ should build it all for you).

You can read a more detailed description of the method here (I’m about to put it on arXiv), but the general principle is that, at each time-step, an individual can reproduce, be replaced by another individual of another species, or die and not be replaced. New individuals can join a community, allowing the overall abundance to grow. The first two of these events (reproduction and replacement) are analogous to models of DNA substitution in phylogenetics; in phylogenetics such models help us figure out what has happened in the past based on extant DNA.

In this method we need this model to estimate the historical event that took place in a community; there is a circularity because you need the model to estimate the events, and the events to estimate the model (we never see when an individual replaces another individual, right?). While the method doesn’t work so well on simulated data (albeit data I’ve simulated to show the method’s failings), there is some hope that the program detects signal in real, biological data. I’ve decided to put this up online because I don’t really have time to do anything more with this, but if you’re interested here are a few things I’d like to try:

  • Re-write the whole thing and use MCMC; I did some experiments with Filzbach a while ago and it looks like integrating out the uncertainty in figuring out the events, as well as having a search procedure that’s less-likely to get caught in local optima, works wonders.
  • Stop treating the individuals in this model as real individuals, and instead view them as positions in niche space that are being fought for among species. This niche space battle can then be a ‘hidden model’ that generates observed abundances.
  • Apply this to some real data in simpler systems. I’m convinced that species like butterflies, where (generally) there’s an alternation of generations with little overlap, have a lot of fluctuations that are modelled quite nicely by something like this where abundances are essentially a random draw from a slightly biased bag.

Let me know what you think! If you’re interested in doing something with this… let me know!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s