Taxonomic cleaning and building phylogenies

The Uma Thurman wasp, courtesy of crazy taxonomist Donald Quicke (well, he's not that crazy, but there we go). Via geekosystem.

The Uma Thurman wasp, courtesy of taxonomist Donald Quicke. Have you any idea how hard it is to find a picture to accompany a post on taxonomy cleaning? Via geekosystem.

I spend most of my time cleaning up species names, and using them to build phylogenies (have I screamed the word phyloGenerator at you recently?). However, most of the time if you’ve got a pretty small species set and you’re studying something well-known, you can get away without building a tree at all if you’re smart with how you match all the names up.

A number of people have been emailing me asking for my taxonomy scripts, and so I’ve now pushed some phylogeny making and taxonomic cleaning scripts (based heavily on the code that ships with phyloGenerator, so please cite that if you use it) into willeerd. It takes a list of plant names, cleans them up, and then sticks them into a phylogeny you specify (I’d recommend this plant phylogeny). You can also use the functions congeneric.merge and make.composite.with.polytomies to put missing species into any kind of phylogeny, either based on genus names or based on binding groups (intended to be genera) in to replace something else in a phylogeny. They all, by default, follow the bladj algorithm, which is a very fancy way of saying that inserted nodes are evenly spaced wthin a clade.

If you’re going to use these functions, please be careful. Taxonomy != phylogeny, and there is no guarantee that using taxonomy to fill in for phylogeny won’t cause major, major problems. Indeed, if taxonomy were sufficient to fill in for phylogeny, you should ask yourself why you’re even trying to use a phylogeny in the first place. The dating of nodes within a phylogeny is really hard, and while there are excellent methods to try and correct for a lack of information, it’s still really hard. I’d strongly recommend using a program that incorporates DNA data (like phyloG– OK, I’ll shut up), and even then be careful. This stuff is hard for a reason…!

As with the plotting functions I put up a few posts back, I use this code quite a lot myself, so please don’t be surprised if it changes in the coming days/weeks/months. Let me know if there’s anything in particular you’d like to see up there – I take requests (this post is one), so just let me know.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s