pez: Phylogenetics for the Environmental Sciences

You’re not famous until they put your head on a pez dispenser. Or something. Image from pez.com; hopefully advertisement isn’t copyright infringement…

I have a new R package up on CRAN now (pez: Phylogenetics for the Environmental Sciences). We worked really hard to make sure the vignette was informative, but briefly if you’re interested in measuring:

  • a combined data structure for phylogeny, community ecology, trait, and environmental data (comparative.comm)
  • phylogenetic structure (e.g., shape, dispersion, and their traitgram options; read my review paper)
  • simulating phylogenies and ecological assembly (e.g., scape, sim.meta, ConDivSim)
  • building a phylogeny (phy.build)
  • applying regression methods based on some Jeannine Cavender-Bares and colleagues (eco.xxx.regression, fingerprint.regression)

…there’s a function in there for you. For what it’s worth, I’m already using the package a lot myself, so there is at least one happy user already. I had a lot of fun working on this, mostly because all the co-authors are such lovely people. This includes the people who run CRAN – I was expecting a hazing for any kind of minor mistake, but they’re all lovely people!

I learnt a few things while getting this ready to go which might be of interest if, like me, you’re very naive as to how to do collaborative projects well. I don’t think much of this is R-specific, but here are things that I was surprised by the importance of…

  • People are sometimes too nice.
    • So you have to needle them a bit to be constructively nasty. Some (I’m looking at you, Steve Walker) are so nice that they feel mean making important suggestions for improvements. Some  feel things are lost in writing them down and prefer talking over Skype (e.g., me), others are quicker over email.
    • Everyone has different skills, and you have to use them. Some write lots of code, others write small but vital pieces, some check methods, others document them, and some will do all of that but be paranoid they’ve done nothing. Everyone wants to feel good about themselves, and if you don’t tell them what you want from them, they won’t be happy!
  • Be consistent about methods.
    • I love using GitHub issues, but that meant the 5% of the time I was just doing something without making an issue about it… someone else was doing the same thing at the same time. Be explicit!
    • If you’re going to use unit tests make sure everyone knows what kind of tests to be writing (checking values of simulations? checking return types?…), and that they always run them before pushing code. Otherwise pain will ensue…
    • Whatever you do, make sure everyone has the same version of every dependency. I imagine at least one person has made some very, very loud noises about my having an older version of roxygen2 installed…
  • Have a plan.
    • There will never be enough features, because there will never be an end to science. Start tagging things for ‘the next version’; you’ll be glad of it later.
    • Don’t be afraid to say no. Some things are ‘important’, but if no one cares enough to write the code and documentation for it, it will never get done. So just don’t do it!
Advertisements

Load that package! Etc.

I’m travelling back to the UK this weekend (yaaaaay!) and so, while I might write some (buggy) code on the plane, I thought it would be a push to get something new up this week. So, instead, I’ve “checked over” the willeerd package, and you can now install it like so:

require(devtools)
install_github(username="willpearse", repo="willeerd")

Tah-dah! My hope, in the next few weeks, is to have a few more posts with actual code within the page (like the above, but slightly less trivial). I might even veer off into posting about the usual sorts of R/ecology/evolution questions I get asked a lot, so if you have any preferences please do let me know!

Phylogenetics in Julia! Not R, sorry…

...and now for something completely different.  The Monty Python foot, taken from  ecogirlcosmoboy

…and now for something completely different. The Monty Python foot, taken from ecogirlcosmoboy

When I first read about Julia, I dismissed it as a nice idea that was coming too late into an already crowded niche space of programming languages. Fast forward almost two years and Doug Bates (of mixed effects models fame) is using it, and even Wired is is talking about it. So I decided to give it a try by making a phylogenetic library; don’t get too excited because it only sort-of loads a Newick phylogeny and not much else.

My first impressions were very good. It is fast – my code is dreadful (I wrote it with a few beers) and even with recursion it’s usable. It’s also very readable; despite a few quirks, it’s probably easier to read than R. The features that impressed me (in vague order of increasing nerdiness) are:

  • Easy to use multiple processors
  • Package management is all linked into GitHub, so there’s no messing around with CRAN-like central repositories (…yes, that could also be bad)
  • The typing system is well-done and saved me time. If I say a function only takes a phylogeny, it only takes a phylogeny, and is very vocal about it without me writing checking code
  • Types are defined explicitly, and there aren’t three kinds of class (R…!) to mess around with
  • Emacs (through ESS) already gives me a good coding environment. I couldn’t get JuliaStudio to work on my Ubuntu computer, but I’m not a fan of RStudio anyway so it was no biggie

That said, there are kinks, and while it’s only at version 0.2 some are surprising gives it’s over two years old. You’re going to find yourself on developer discussion pages quite quickly because the help files are still being built. You can’t delete or redefine variables or types, which is a nightmare when you’re experimenting. I’ve had a rather vocal falling out with Julia’s regular expression matching (matchall, not match, is your friend), the debugger hasn’t been touched in a while (…but it works), and there’s no standardisation of graphics yet. More fundamentally, Julia needs to take statistical formulae seriously; the expression notation is sort of alright, but not every function listens to it. I want to be able to plot(y ~ x), damnit!

Bottom line: it’s not ready yet, it made me grow a few grey hairs, and The Queen isn’t dead so long live R. However, take a quick look at any package up on GitHub, and you’ll find something you won’t see on any R packages – there’s no C code. I’m tired of people talking about how great it is that you can easily use C++ in R – I learnt R specifically because I wanted to move away from C++. If you have to solve your problem using another language, maybe you weren’t using the right language to begin with. I don’t think Julia will supplant R any time soon, but I am going to keep plugging away at phylogenies in it; I want to do some big simulations and I really don’t want to return to C++. I doubt I’m the only one.