I think the most important thing I’ve taken away (or am in the process of taking away) from graduate school is the absolute centrality of proper analysis as the corner stone to the scientific enterprise. More specifically statistics, though much maligned, is truly essential to the scientific endeavor because it occupies the liminal space between empirics and theory. It’s the link between the collection of random facts that you’ve gathered in the ‘real world’ and the abstract mental meanderings that the arm chair ecologist (or insert whatever field here) has made up during a night of heavy drinking. Science only works when both come together, and they come together through statistical hypothesis testing.

In the course of a project I’m working I’ve had to embroil myself in the heart of the bayesian versus frequentist debate. Basically the gist is this. Frequentist methods are the norm in statistics, and ask what is the probability of the data you’ve collected given a hypothesis about the way the world works (generally this takes the form of a null hypothesis, namely that the parameter that you’re interested in doesn’t do anything to the data). Bayesian statistics turn this on its head, and ask (what is arguably a more sensible question) what is the probability of a hypothesis in the ‘real world given the set of data that we’ve collected. The hitch is that in order to conduct bayesian statistics you have to include a prior that indicates you prior belief about how the parameter of interests affects the data. And by including this prior, even if you try to be objective, you still may be influencing the results you get out the other end.

So this is a central philosophical difference between the two statistical approaches. And I’m told I should care [http://dynamicecology.wordpress.com/2011/10/11/frequentist-vs-bayesian-statistics-resources-to-help-you-choose/]. The problem is I really don’t. It’s not so much that I don’t think the difference is important. It’s just that there are so many more important considerations that are absolutely crucial when thinking about data before you get to these philosophical differences.

At least in my thinking these issues are the low lying fruit. Even with mixed effect models becoming more and more used (and possibly over used?) in ecology, crucial correlation structures are often ignored in the data. Some of this is perhaps statistical machismo (sensu http://dynamicecology.wordpress.com/2012/09/11/statistical-machismo/), but taking into account, and estimating the effect of phylogenetic covariance, spatial covariance, or detection probability (and correlates thereof) are interesting and important ecological questions in their own right. Properly accounting for, and thinking about the meaning of, these estimates represents a crucial front in the expanding front of human knowledge.

I’ve come to the (perhaps temporary) opinion that analyses should not be lauded or vilified for being bayesian or frequentist. I’ve even read papers that I like that take the rather surrealist approach of running a bayesian model and then doing frequentist post-hoc tests on subsets of that models’ output. Sure, whatever.

But I’ve also read papers in which the models the authors use to test their data don’t have parameters that allow them to test what they think they’re testing. For example, I remember a particularly egregious example in which the author made the claim that their model showed that all birds responded negativly to forest clearing. But the model they used to fit their data forced all birds to respond to forest clearing in the same way (a single global mean response, rather than a species-specific response). So the larger point here is that models should be constructed in an intelligent manner with all the parameters doing something that they are specifically intended to do. If I were forced to make an enlivened plea (which I would never do in polite company) it would be something like: “Forget statistical philosophy! Pay attention to what the damn parameters are doing!”.