Will’s Library: how I store my papers

There is no cake. Who knows, maybe in 100 years time my library will develop sentiency. Maybe it won’t. This is a picture of GLaDOS, from the game Portal 2 – go play it, it’s fun. There we go, now this isn’t copyright infringement, it’s advertising.

Like every other scientist I’ve ever met, I constantly complain that I’m not reading enough. When I was a PhD student, I used DevonThink Pro (which I got for free as a student – smart move Devon!) to keep tabs on everything I’d read/needed to read, but when I switched to Linux I couldn’t use it anymore.

This started a ~two year search for a good way of handling all my papers. Before you all scream ‘Mendeley’ at me, I was an early-adopter (2009) and it deleted all my notes, and I have yet to find something else that I’m certain I could copy all my PDFs and notes out of with ease (which DevonThink let me do).

So, in a fit of irritation, I wrote my own program to store all my papers and notes. It’s very, very simple (which is the only reason I use it) – put all your PDFs in one folder (and maybe sub-folders within it), keep notes on them in a separate file (one paper per line), name your PDFs sensibly, and you’re done. I had very grand plans of making this an even bigger program with a web interface, etc., etc., but after arsing around with Sinatra (which, by the way, is great) I decided I didn’t need any of it.

So, if, like ~=90% of the scientists I’ve met, you’ve just got a big folder where you keep everything, or if, like me, you’re paranoid about using a structure that means you’ll never be trapped in a program, give it a go. On Linux, type ‘ctrl-alt-t’, then ‘wl -p pearse’ to see anything I’ve written that you bothered to keep, and ‘wl -p pearse -o 1’ to open the first/only of those papers. And yes, I know this is a very simple program, but this is probably the most useful thing I’ve written all year 😦


r2 of an r2 of r2s: progress in ecology

Like many, I read the recent paper about decreasing explanatory power in ecology with skeptical interest; it’s a cool paper and I guarantee it will make you think. The authors scraped a load of r2 and p-values from ecology papers over the last few decades, and plot the average r2 and count of p-values through time. They find that the average r2 (the explanatory power) of ecological studies is declining through time. I’m a bit of a fan of Bayesian stats, so I find the idea that p-values are a measure of hypothesis testing a bit galling, but I decided to take a look at the data for myself.

Below is their figure 3, which shows the trend in mean r2 through time, and has an r2 of 62%.


…which seems fine, until you open up the data it’s from and plot the data underlying those mean yearly numbers:


…which, to me, contains an awful lot of scatter that isn’t otherwise apparent. Let’s ignore the data before 1969 (although including them changes nothing substantial). Take a look at a density plot of the same data with a best-fit line through it:


Good news! The regression is still significant (it should be, there are 10,759 points in this plot…), but the r2 is 4%. 4% is quite a lot less than the 62% the authors obtained when they averaged out their variation at the year-level. The within-year variation is so large that I don’t think this decline, while statistically real, is something we could use to make predictions. The authors tried to control for this (I sense) in their regression by weighting according to how many values made up each average. I don’t think that goes far enough, because we have the original data to work with (why average), and sample size is not the same as confidence – it would be better to weight by the means’ standard errors. I’m also not convinced a mean (or linear regression) describes this kind of bounded data very well, but I could be convinced otherwise.

In summary, there is a decline in explanatory power in ecology, but the explanatory power of that decline (…) is small and so I don’t think we should get too worked up about it. By all means talk about what this decline means, but if the r2 of the r2s is 4%… do we need to freak out?