Where Kane went wrong, and an associated issue

August 19, 2007

Referring back to the seminal post of this blog, this post is aimed at pointing out where Kane’s analysis goes wrong. I have tried to make this point several times in the comments to “Deltoid” post relating to Kane’s paper, but for completeness, here it is again.

Kane implicitly attributes posterior distributions to the parameters being estimated in the Lancet paper – CMRpre, CMRpost, and RR. He then interprets the 95% confidence intervals stated for those parameters as covering 95% of the mass of those distributions. This reflects a fundamental misunderstanding of frequentist statistics, and of CIs in particular.

Since the parameters are not random variables, not only are statemens like “CMRpre ~ N(μpre, sigpre)” (p. 7) false, but also statements such as “P(CMRpre > 3.7)” (p. 11) are meaningless – CMRpre is either greater or less than 3.7, but no probabilities are involved.

Dispensing with Kane’s paper is easy, but in a way he does have a point – not about the Lancet paper, but about accepted statistical analysis. It is surprising that the available, or at least the widely applied, tools of statistics do not provide a standard, fool-proof, way to handle data sets such as that of the Lancet study.

The Lancet study data set contains samples from two unknown distributions over the positive numbers. One of those samples – those for CMRpost – contains a positive outlier, i.e., a data point that is much higher than all the other points. The Lancet paepr dropped that point in their analysis, implying (reasonably, but without making a rigorous argument) that this, if anything, biases the estimate downward.

There seems to be a clear need for a method of analysis that would be applicable to data sets with outliers. Current practice, relying on a set of ad-hoc methods (normal assumptions, dropping outliers, boot-strapping, robust statistics) is unable to handle such data sets in a rigorously convincing manner. In fact, in the absence of a rigorous definition of what an outlier is, standard practice is not rigorous for any data set.

Posted in statistics | 2 Comments »

2 Responses to “Where Kane went wrong, and an associated issue”

David Kane Says:

September 14, 2007 at 1:16 pm
Note that I make clear in the paper that I prefer to approach the problem as a Bayesian. Just because the original paper is frequentist does not mean that I, or anyone else, need be frequentist in criticism of it. Note also that in every public comment (that I have seen) by the authors, they present a Bayesian interpretation of their results, i.e., less than a 2.5% chance of fewer than 8,000 deaths. If the authors themselves can be Bayesian, then why can’t I?

I agree with the rest of this post. As best I can tell, there is no “standard” method for dealing with this sort of data. How about a post on what how other (smart) people have dealt with it?
yoramgat Says:

September 15, 2007 at 2:37 am
Kane:

Your paper is neither Bayesian nor frequentist. For example, confidence intervals, which you purportedly deal with, are a frequentist notion.

Furthermore, even if, venturing far into hypothetical territory, you were to present a coherent Bayesian analysis, any conclusions you would be able to draw would depend on your priors, and therefore could not be a proof of any flaws in the original analysis. Statements such as “The lower bound of the confidence interval
for the relative risk can not be 1.6, as reported in L1,” (p. 13 of your paper) could not possibly be valid even under this unlikely scenario.

Pro Bono Statistics