IFHS – Missing clusters and extrapolation using IBC data
January 19, 2008
The IFHS surveyors did not visit all of the clusters in their sample. Those areas that were judged to be dangerous went unsurveyed. Most of the unsurveyed clusters were in Baghdad (31 out of 96) and in Anbar governorate (71 out of 108). A smaller number of clusters went unsurveyed in Nineveh (12 out of 72) and in Wasit (1 out of 54).
It appears that the missing clusters in Nineveh and Wasit were ignored. This has the potential of introducing significant bias into the estimate of mortality in those governorates. Removing the clusters in areas within a governorate that were considered dangerous turns the estimator into an estimator of the non-dangerous areas. It seems likely that the mortality in the non-dangerous areas only would be smaller than in the governorate as a whole. If the dangerous areas had seen a significant part of the deaths in the governorate, then removing them from the sample would bias the estimator significantly downward, and the more dangerous a governorate is, the more significant the bias is.
Indeed, as can be expected from such a differential bias, the death rates in the various governorates in the IFHS sample (before any adjustments) show reduced variation compared to the death rates in the two sources that the IFHS authors compare themselves to – Burnham et al. and Iraq Body Count (IBC).
Extrapolation using IBC data
For Baghdad and Anbar – the two governorates with the largest numbers of missing clusters, and which together account for about 60% of the deaths in the IFHS estimate – the IFHS authors make an attempt to account for the missing data by extrapolating from a different area. To do so, the authors ignore the data collected in those governorates (i.e., in the 65 clusters in Baghdad and in the 37 clusters in Anbar, that were considered safe enough to be surveyed). Instead of using those data, they extrapolate from a reference area (which they define as the “three provinces that contributed more than 4% each to the total number of deaths reported for the period from March 2003 through June 2006”. These are 3 of the following: Babylon, Basra, Diyala, Nineveh, and Salahuddin) . The death rate estimated in the reference area, rRef, is taken as a baseline. The estimated death rates in Baghdad and in Anbar are calculated as the reference baseline multiplied by extrapolation factors.
The factor used for each of the governorates is determined by statistics gathered from the IBC database. For each of the three areas – Baghdad, Anbar and the reference area – the IFHS authors calculated the IBC death rate – i.e., the number of deaths in the IBC database occurring in the area divided by the size of the population in the area. Let these be rIBCBaghdad, rIBCAnbar, and rIBCRef. The extrapolation factors used are the ratios rIBCBaghdad/rIBCRef and rIBCAnbar/rIBCRef. That is, for Baghdad, for example,
rBaghdad = rRef · rIBCBaghdad / rIBCRef,
where rBaghdad is the IFHS estimate of the death rate in Baghdad.
It turns out, therefore, that the death rates for Baghdad and Anbar are estimated based on IFHS data from the reference area only. All the information regarding the Baghdad and Anbar themselves is gleaned from IBC data. This implies that the estimate of the death rate for Iraq as a whole is based only on data collected outside of Baghdad and Anbar, and on IBC data.
If the coverage rates of IBC – that is, the chance that a given death in Iraq will be picked up by the IBC media monitors and be included in the database – is the same for Baghdad, Anbar and the reference area, then the extrapolation estimates are, on average, accurate. Otherwise, systematic errors are introduced.
Unfortunately, it seems unlikely that the coverage of the IBC database is constant across different areas, with different characteristics. Because of various factors, Western reporters, on whose reports IBC relies, are not likely to be covering all of Iraq uniformly. If certain areas were considered too dangerous for the surveyors to visit, for example, it may be expected that these areas were also avoided by media.
Empirically, as well, the assumption of constant coverage does not hold: the supplementary material of the IFHS paper reports that the ratio between IFHS based mortality estimates and IBC based estimates is 1.6 times higher in the “High mortality governorates” than in the “Low mortality governorates.” Surprisingly, this would indicate an inverse bias – less coverage of the low mortality governorates. Different factors determining media coverage – such as distance from U.S. and British bases or the cause of death – may be at play, or, this may be the effect of the differential bias in IFHS data mentioned above (some of the more dangerous clusters in the “Hig mortality governorates” have been left unsurveyed).
Breakdown of deaths by area
Beside introducing a potential source of bias, the extrapolation has another important effect. The extrapolation procedure guarantees that the ratio between the mortality rate in Baghdad and the rate in the four governorates outside of Baghdad with the highest mortality rates – i.e., Anbar and the 3 reference governorates – is identical in the IFHS estimate to that in IBC. Since those 5 areas account for the large majority of the deaths in the estimates (over 85% in each), those identical ratios guarantee that the breakdown into areas (Baghdad, High-mortality, Low-mortality, and Kurdistan) of deaths in the IFHS estimate will be very similar to the breakdown of the IBC. That is, the striking similarity of the IFHS bar and the IBC bar in panel A of Figure 1 in the paper is an artifact of the extrapolation method rather than a feature of the data. The IFHS authors mistakenly present this artifact as a matter of coherence between these two sources of information and wrongly see it as lending credibility to both studies. In fact, the breakdown to areas in the raw IFHS data, before extrapolation based on IBC data, would have produced a result that is very different from either the IBC breakdown or the Burnham et al. breakdown (Baghdad’s share of the deaths, for example, would be about 40% – somewhere in the middle between Baghdad’s share according to IBC and Burnham et al.).
In conclusion: the fact that clusters in dangerous areas were excluded from the sample is a major weakness of the IFHS, from which there is no safe way to recover. Neither ignoring the missing clusters (as seems to have been done for some cases) nor the extrapolation using IBC data (as was done in the cases of Baghdad and Anbar) are satisfactory solutions. Because Baghdad and Anbar account for a large part of the death estimate, any biases introduced by the extrapolation method would produce large impact in the estimate of the total.
These issues should, at the very least, have been addressed through the use of uncertainty in the extrapolation factors. Doing so would have increased the uncertainty of the estimate substantially. Even this rather weak account of the problems associated with the extrapolation was not taken.
In addition, it is the extrapolation method, rather than the IFHS data themselves, that accounts for the resemblance in the breakdown of deaths to areas between the IBC database and the IFHS estimate. This resemblance, therefore, should not be seen as being mutually confirmatory, as is implied in the IFHS paper.