September 14, 2007
I have recently become aware, through a couple of posts in the Cognitive Daily blog (1, 2) of the “p-rep statistic”, which is presented by its inventor, Peter Killeen of the department of Psychology at Arizona State University, in his paper “An Alternative to Null-Hypothesis Significance Tests“, as follows:
The statistic prep estimates the probability of replicating an effect. It captures traditional publication criteria for signal-to-noise ratio, while avoiding parametric inference and the resulting Bayesian dilemma. In concert with effect size and replication intervals, prep provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.
The author of the Cognitive Daily blog entries, Dave Munger, describes p-rep as follows:
Very roughly, a prep gives an approximation of the probability that a particular result, repeated on a new sample, would be observed again.
This [prep = .953] means, roughly, that we’re about 95 percent certain that repeating this experiment will give the same results,
along with other comments to the same effect.
I intend to read Killeen’s paper, but I have not done so yet. However, without any hesitation I can already say that Munger’s understanding of what p-rep means is utterly wrong. I’ll excercise some prudence by reserving final judgement on Killeen’s position until after I have read his paper, but it certainly appears that Killeen is wrong as well, or, at the very least, misleading the readers of his abstract.
Munger reports that “the Association for Psychological Science (the APS — not to be confused with the APA, the American Psychological Association) has adopted this new measure of significance [p-rep] in its highly-respected journals”. This seems highly premature on the side of the APS, and one has to wonder whether any professional statisticians have been consulted before this move was made.
The reason that without any hesitation one can say that Munger’s understanding is wrong, is that without specifiying extra information, namely a prior distribution over the unknown parameter being estimated, no “probability of replication” can be calculated, estimated, or even consistently defined.
It also seems, from Munger’s first post and from the Wikipedia entry, that there exists some formula converting p-values to p-rep values. If this is true, which I do not take for granted, then, of course, it would show that there is no extra information encoded in the p-rep number over the p-value number. Whether the p-rep value is “more intuitive” or “more meaningful” than p-values would be a matter of taste, naturally, but even if that would somehow turn out to be true (which, again, I do not take for granted) it would make a very weak case for replacing a 8 decade old standard with a new one.