The Evils of Null Hypothesis Significance Testing

30. April 2010

The standard inferential statistics that is normally taught in psychology, sociology, and related fields is "null hypothesis significance testing" (NHST). This is a strange and oddly inconsistent hybrid between the ideas of Ronald Fisher and his theoretical enemies Jerzy Neyman and Egon Pearson. Many very smart methodologists and statisticians have warned that this statistical framework, at least in the way that it is normally taught and used in publications, is deeply flawed. And although their arguments have never been countered, their message has been largely ignored. Although many psychologists and cognitive scientists have a vague feeling that something is not entirely kosher about NHST, they often don't know why, and don't know any alternative. But as long as we are judged by the quantity of our publications, and journals and reviewers keep requiring us to make our cases using NHST, we seem to have little choice but to keep performing this mechanical ritual.

I will briefly talk about the history of inferential statistics in the empirical social sciences, and explain what is wrong with NHST. I will give some real examples of how this can lead to absurd situations and bad science. I will also briefly mention a good (or at least consistent) alternative, Bayesian statistics, and its pros and cons. Finally I will argue that even though we have no real choice but to keep reporting the statistics that journals and reviewers want us to, the least we can do is show in our reporting that we know what we are doing (and what not) when we are forced to do NHST.