5 problems with the science of the IFHS study

January 17, 2008

Reviewing the IFHS study, I found 5 problems with the science of the study. I believe that taken together (but particularly the first three points, regarding the crucial role extrapolation plays in arriving at the estimates in the study, and regarding the ratio of under-reporting) those problems should be seen as grave. At the very least, they should be seen as putting the findings of the IFHS on equal or inferior footing to those of Burnham et al., rather than as being on superior footing due to the nominal large size of the sample in the IFHS.

Below I give brief abstracts of the five problems with links to more detailed posts regarding each. Unless explicitly stated otherwise, death rates and counts below refer to violent deaths as defined by the IFHS authors.

1. Missing clusters and extrapolation using IBC numbers. The IFHS surveyors did not visit all of the clusters in their sample. Those areas that were judged to be dangerous went unsurveyed. A minority of those gaps (in Nineveh and Wasit) seem to be ignored, introducing potential bias. To fill in the rest of the gaps, the IFHS authors extrapolated from other areas. The extrapolation method was to calculate the mortality rate in all of Baghdad as a fixed factor times the mortality rate in some reference area, where the fixed factor was calculated using Iraq Body Count data. The same method (with a different factor) was applied to all of Anbar as well.

It is important to note that these extrapolations determined the total number of deaths estimated for Baghdad and Anbar. Any data that was collected within those areas was in effect ignored in calculating the death estimates. Thus the death count in Baghdad and Anbar, that together account for over 60% of the deaths in the estimate for the total, are purely a matter of extrapolation, and depend directly on the IBC extrapolation factors. To illustrate: the extrapolation factor used for Baghdad was 3.08. If instead the number was 6, that would have added about 80,000 deaths to the estimate.

The reliability of the IFHS estimate thus depends directly and substantially on certain properties holding for the IBC data (namely coverage rates which are constant across space and across political characteristics). We have no reason to assume that those properties hold, and have some reason to assume they don’t. The IFHS authors have apparently made no attempt to account for those issues – not so much even as to factor uncertainties into the size of the confidence interval.

In addition, the extrapolation method is the reason for the close resemblance, emphasized by the IFHS authors, between the IBC and IFHS breakdown of deaths by area. This resemblance is an artifact rather than a feature of the raw data and should not be seen as showing coherence between IFHS and IBC.

2. The extrapolation procedure is problematic even if the IBC extrapolation factors are assumed accurate. The extrapolation basis is the death rate in 3 reference governorates (the paper does not say exactly which, describing them only as the “three provinces that contributed more than 4% each to the total number of deaths reported for the period from March 2003 through June 2006″). Most governorates were sampled with 3 x 18 = 54 clusters each. Nineveh was sample with 72 clusters. Thus the estimate of deaths for Baghdad and Anbar (which, again, account for over 60% of the total) relies on at most 2 * 54 + 72 = 180 clusters. This number, much smaller than the nominal size of 971, is the dominant factor in determining the uncertainty of the estimate of the total (again, even if the extrapolation factor is assumed to be correct and known precisely).

This is the reason why the length of the confidence interval for the IFHS study (about 120,000 deaths) is not much smaller than that of Burnham et al. (about 370,000) despite the fact that Burnham et al. used only 47 clusters.

3. The IFHS does not account properly for uncertainty in under-reporting. In the same way that the IFHS estimate depends on the extrapolation factor, it depends on the assumed under-reporting factor. The justification for the factor used seems slim (I have not made an attempt to follow the reference given). Even accepting their assumptions – i.e., treating the proportion being reported as a normal variable with mean 0.65 and standard deviation of about 0.075, the authors fail to properly account for the uncertainty in the under-reporting in their calculation of the confidence interval of the estimate of the death rate. A proper accounting would increase the size of the confidence interval by about 25%.

4. In the IFHS paper, the heading “violent deaths” does not include certain types of injuries. I could not find this mentioned in the paper itself, but table 3 in the supplementary material and a statement by WHO official indicate that car accidents and “unintentional injuries” are not included in the estimate. This may seem reasonable a-priori regarding car-accidents, and to a lesser extent regarding unintentional injuries. However, contrary to the statement, those two categories account for more than a third of the deaths by injury in the survey. Also, there has been a dramatic increase in both of those categories as compared to pre-war rates. Under those circumstances, it appears unjustified to exclude these categories from the estimate. Including them in the estimate would increase it by more than 50%.

5. The last point is more of an indication of trouble (either in the methodology of the survey or in the way it is described in the paper) than a specific problem with the estimation. According to the description of the sampling method, 10 households were surveyed in each cluster, and there were (with few exceptions) 3 x 18 = 54 clusters per governorate. In such a set-up there should be no correlation between the number of people surveyed in each governorate and the size of the population in the governorate. However, looking at table 2 in the supplementary material of the paper, there appears to be a strong correlation between those two figures. It seems that the only way such a correlation could show up is the unlikely situation in which the size of the population in the a governorate is strongly correlated with the average household size in the the governorate.


19 Responses to “5 problems with the science of the IFHS study”

  1. […] significant problematic points manifest themselves – I will enumerate them in an upcoming post (here). The fact that such problems exist is interesting for several […]

  2. […] check out Gat’s explicit critique of the WHO study (the IFHS study, in his telling).  His analysis leads him to conclude that the new study is not […]

  3. Hellion Says:

    “I believe that taken together (but particularly the first three points, regarding the crucial role extrapolation plays in arriving at the estimates in the study, and regarding the ratio of under-reporting) those problems should be seen as grave. At the very least, they should be seen as putting the findings of the IFHS on equal or inferior footing to those of Burnham et al.,”

    How, considering the “crucial role” extrapolation plays in arriving at the only meaningful output of the study, could this estimate be on any worse footing than the Lancet2 study?

    I’ve seen the Lancet study authors criticise the IFHS and claim superiority to it, as quoted on Deltoid, the same as you have. Among other things they point to their crosschecking of death certificates as a bonus.

    What have you seen in this study which, in terms of the evident discrepancies in its extrapolation, gives it less credibility than the study which claimed its surveyed deaths matched official death certificates to a 90% degree, from which they extrapolated an estimate which matched to a 10% degree, among other anomolies…

    Wouldn’t any comparison which favoured Lancet2 in terms of credibility go to some lengths to avoid any examination of extrapolation, given the obvious, dubious and now long-term-unexplained implicaitons of those extrapolations?

  4. yoramgat Says:

    Hi Hellion.

    I don’t see your point. Do you think that 10% of the deaths counted in Lancet 2 were attributed to the wrong cause? Do you think that 10% of the deaths counted in Lancet 2 did not occur at all?

    I would not consider either of those cases (even if true) as being a significant weakness of Lancet 2. A 10% error in the total count or in attribution would make very little difference for an estimate with an uncertainty factor of 2x.

    Anyway, collecting 90% of the death certificates is much better than collecting no death certificates – as is the case for IFHS. I did not even bring this issue up since I think that there are much more serious flaws in the IFHS study.

  5. Stephen Says:

    The elephant in the room is the Lancet 2 sampling methodology. After the Main Street Bias critique came out, Burnham responded by claiming that the *published* Lancet 2 sampling methodology was not used in the majority of areas in Iraq. Another (so far unpublished) methodology was used to avoid “busy street selection bias”, Burnham claimed:

    “As far as selection of the start houses, in areas where there were residential streets that did not cross the main avenues in the area selected, these were included in the random street selection process, in an effort to reduce the selection bias that more busy streets would have.” (Gilbert Burnham)

    Burnham et al have never published details of how this “random street selection process” worked. Their published sampling methodology (By Burnham’s own admission) applies only “in areas where there were NO residential streets that did not cross the main avenues”.

    Given the importance of randomness of sampling to such surveys, I suggest that the absence of a published sampling methodology (the one that was actually used) overrides most of the other concerns over Lancet 2, and urgently needs to be addressed.

  6. yoramgat Says:

    Hi Stephen.

    I agree that sampling methodology is an important issue. In a situation like post-Invasion Iraq, it is hard to obtain random samples with uniform probability over the entire population of a large area, or even of a single city.

    It is not clear to me what the description of the sampling method given in the IFHS paper and supplementary material boils down to. It is highly unlikely that they had a comprehensive list of households from which they could sample. Some kind of spatial sampling was very probably involved here as well. I have no specific reason to assume that the sampling of the IFHS was better than that of Burnham et al.

    The sampling issue is a good reason to take a skeptical attitude toward any survey taken in post-invasion Iraq (or any other war ravaged zone), but I don’t see this as a weakness of Burnham et al. in particular.

  7. Stephen Says:

    Well, it’s over a year since Lancet 2 was published. The authors have had enough time to release the all-important details of their sampling methodology. So why haven’t they?

    Their original reason for not publishing these details was “lack of space”. Not very convincing, as it turns out (given the almost infinite amount of space available on the web for them to publish the details if they so chose).

    It’s difficult to take a scientific study seriously when its authors won’t release crucial information about the methodology used (in an area of central importance – randomness of sampling).

  8. yoramgat Says:

    I am all for full disclosure. Unfortunately this is a standard which is rarely met. In many cases there is little for the authors to gain and a lot to lose by disclosing all that is of interest. (If nothing else, disclosure involves a lot of work.) If disclosure was required by the publication venue as a condition for publication things would be very different – and we would have better science.

    Singling out Burnham et al. for following this ubiquitous practice seems unreasonable. As I wrote in my previous comment, I don’t think that the IFHS study is better in this respect – it is completely unclear to me how they generated their sample. Is it clear to you? Even the partial description of the sampling procedure that they did give seems to be inconsistent with the data they provide.

  9. Stephen Says:

    I’m not really talking about “full disclosure”. I’m talking about clarifying the confused position which was adopted by the Lancet authors after the Main Street Bias criticism came out. They essentially disowned the published sampling methodology in favor of another, unspecified, one (thereby side-stepping Main Street Bias, which addressed their published sampling procedures).

    Given the stakes (it’s one of the most politically and scientifically important studies of recent years), one would hope this crucial matter could be cleared up. It’s been over a year now. Which sampling methodology was peer-reviewed? The published one, or the one they actually used?

    We’re not talking about some obscure details – we’re talking about something upon which the claimed validity of the whole study rests.

    The IFHS study has only just been published. Give them a chance to respond to questions and demands for more substance to back up their paper. And if they suddenly change their minds about the sampling methodology they used (or whatever), I’m sure they won’t be allowed to remain silent about it. And if, in a year’s time, they still haven’t released something crucial to the assessment of their study, then you’ll hear the same criticisms from me. I’m not unreasonably singling Burnham et al out for criticism on this matter.

    To repeat: I’m pointing out that something CENTRAL and CRUCIAL to the scientific assessment of the Lancet study is still *missing*, over a year after the study’s release.

  10. yoramgat Says:

    Coming up with various hypotheses of how bias could be introduced into random sampling is an easy game to play, and can be played in the context of almost any paper which employs sampling. If you find the “main street bias” hypothesis very convincing – that is fine. To me this is an issue, but a minor one – spatial sampling of some kind seems unavoidable in the context of Iraq, and any such sampling would be suspect. I find the multiple weaknesses in the IFHS study much more weighty than the potential for “main street bias”. To some extent, the fact that this is the best Lancet critics could come up with is an indication that the study is pretty sound.

    I believe you have an unrealistic view of how rigorously papers are examined – either in peer review or in the scientific community in general. Do you examine scientific papers carefully and critically on a regular basis? For better or for worse, there are a lot of crucial open issues associated with many, if not most, if not all, empirical papers.

    Again, singling out a paper and insinuating that its authors are hiding something because you find its findings unpleasant (or even unlikely) is more of a political maneuver than a scientific inquiry. It is perfectly legitimate to have doubts about the findings of a paper due to various perceived weaknesses and because of prior beliefs, but unless those weaknesses are unusual, implying that the fact that doubts exist discredits the authors is a different matter.

  11. Stephen Says:

    I think you still miss the point. My concern is not with the merits of the Main Street Bias paper, per se, but with the *response* of the Lancet authors to it.

    That response was to effectively disown the published sampling methodology and assert that some other (unspecified) sampling procedures ensured that “all households” were given equal chance of selection.

    Let’s give them the benefit of the doubt and assume this is true. In that case, Main Street Bias is irrelevant to the Lancet study.

    But in that case, the burden of evidence, not to mention a basic, minimum level of scientific responsibility, is on the Lancet authors to at least give an indication of what those (so far unspecified) sampling procedures were.

    With respect, I don’t think it’s “unrealistic” for the wider scientific community to expect more from th Lancet authors on this very important and specific point. Science requires more than assertion and good faith.

  12. yoramgat Says:

    I doubt that the Lancet authors “disowned” the description published in their paper. What you see as them changing their story, they probably see as clarifying some details. (If you have a direct quote from the authors, I’ll take a look and see if the term “disowned” is justified).

    I just find this entire line of criticism unconvincing. The claim that any discrepancy on this issue discredits the entire paper seems politically motivated. As I point out, several significant problems present in the IFHS study did not draw similar responses, for obvious reasons.

    If the scientific community finds that the issue is very important then it should direct more resources into investigating the matter, rather than raise the bar of evidence for those few who did make the effort and take the risk of gathering data.

  13. Stephen Says:

    For the direct quote on sampling see comment 5, above. If you can show me how the *published* sampling methodology might achieve what Burnham claims in this quote simply by “clarifying some details” (as opposed to describing a fundamentally different, separate set of sampling procedures), please go ahead.

    Like it or not, a fundamental change in accounts of sampling methodology, in response to a criticism (which originally arose in an exchange mediated by Science journal) is an important, central issue in studies of this type conducted in conflict zones. One doesn’t have to be “politically motivated” to realise this.

    And no amount of increased resources from the wider scientific community is, in itself, going to cause Burnham et al to be more forthcoming on this point (although it may yield more studies such as Main Street Bias, “political motivation” notwithstanding).

  14. yoramgat Says:

    I don’t see that quote as anything close to describing a new method. It is a somewhat puzzling comment, but nothing of major significance – little puzzles like that are all over many papers.

    Increased resources investment by the scientific community would produce alternative sources of information. By doing that, rather than by scrutinizing every little detail of existing papers, we will gain a better understanding of the real situation in Iraq.

  15. Robert Says:

    There’s (at least) another problem: the allocation of deaths to violent and non-violent causes. You’ve already noted the “unintentional injuries” question; there’s also a “don’t know” category. I can’t quite figure out how or whether a re-allocation of some of the DKs into “violent” would reverbrate through the estimates.

  16. yoramgat Says:

    Hi Robert.

    I agree. I will discuss this issue in the last remaining installment, which will expand on point number 4 above.

  17. Robert Says:

    I look forward to it.

    The allocation of deaths into causes is a different problem than the counting of all-cause deaths, and it’s exacerbated by the way they “defaulted” causes into the non-violence group. A side-effect is that it also makes it harder to go from violent deaths to an equivalent estimate of total all-cause excess mortality — IBC doesn’t count non-violent deaths so even if the IBC inflation for Baghdad and Anbar was reasonable (and that’s arguable, of course) there’s nothing comparable on the non-violent side.

  18. yoramgat Says:

    Personally, I find the IBC-based extrapolation method to be more or less equally unconvincing in accounting for violent deaths as it is for overall deaths.

  19. […] by the WHO and Iraqi Health Ministry is more ‘authoritative’ than the Lancet, as if the former is without its problems. If anything the NEJM study can be used to support the Lancet 2006 study by noting the pre- and […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s