IFHS – Accounting for under-reporting
January 23, 2008
Justifying the factor used to account for under-reporting of deaths
The IFHS sample is a very low mortality group. For the pre-invasion period of about 1.25 years, the group, which contains 61,636 individuals, experienced 204 deaths. For post-invasion period (about 3.25 years) , the group experienced 1,121 deaths. These translate to 2.65 deaths per 1000 person-years, and 5.6 deaths per 1000 person-years, respectively.
The IFHS sample was not designed to be equal weight to begin with (i.e., some people had higher chance of being selected than others), and biases certainly increased after some clusters were dropped from the sample because they were considered too dangerous to access. Re-weighting (in some way – it is not clear whether the adjustment procedure for total mortality is the same one as that used for violent deaths) by the IFHS authors yields significantly higher mortality rate for the pre-invasion period: 3.17 deaths per 1000 person-years (95% CI 2.70–3.75), and somewhat higher rate for post-invasion period: 6.01 deaths per 1000 person-years (95% CI 5.49–6.60).
The low mortality rate is probably due to a certain extent to a young population – I was unable to find the breakdown of the sample by age (neither in the IFHS paper, nor in the report). The mortality rate is much lower, however, than that of the sample of Burnham et al. – not only for the post-invasion period but also for the pre-invasion period. Burnham et al. estimated the pre-invasion mortality as 5·5 deaths per 1000 person-years (95% CI 4·3–7·1). Thus the fact that IFHS authors find it necessary to adjust their estimate upward seems justified.
The problem is, however, to find the appropriate adjustment factor, and to account for any uncertainty in that factor. The authors mention (p. 486), with reservation, the figure of 62% as the proportion of deaths being reported (i.e., 38% go unreported). They also mention (p. 487) modeling the proportion going unreported as being a normal variable with mean 35% and “95% uncertainty range, 20 to 50”, which I take to mean that the standard deviation is (50-35) / 1.96 =~ 7.5%. The ratio between the Burnham et al. and IFHS point estimates for the war mortality rate is 5.5 / 3.17 = 1.74, which would stand for 42% of the deaths going unreported.
It is not clear to me how the figure 35% (or the uncertainty of 7.5%) was arrived at, and in what it is preferable to the 38%, 42%, or any other number, but it that number that is used to carry out the adjustment of the mortality rates. The original estimate of post-invasion violent mortality rate 1.09 d/KPY (95% CI, 0.81 to 1.50) is multiplied by 1 / (1 – 0.35) = 1.54, and becomes 1.67 d/KPY (95% uncertainty range, 1.24 to 2.30). It seems clear that there is a large degree of arbitrariness about the adjustment factor and changes in that value could affect the final estimate significantly. Also, since much of the estimate of the total depends on data collected in about 180 clusters located in a very specific regions (due to the extrapolation procedure employed), it is the under-reporting in those areas, rather than in Iraq as a whole, that matters most. It seems that it would be hard to justify any estimate of that unknown quantity – even within an uncertainty of ±15%.
Accounting for the assumed uncertainty in the adjustment factor
Even accepting the estimate made by the IFHS authors about the rate of under-reporting, a secondary point on this matter remains: adjusting the size of the CI to account for the uncertainty in the under-reporting factor (i.e., the assumed standard deviation of 7.5%).
The original death rate estimate, r0, (before adjusting for under-reporting) is assumed to be a random variable with a standard deviation of about (1.50 – 0.81) / 2 / 1.96 = .18. The adjusted estimate, r1, is equal to r0 times f, where f is the reciprocal of the proportions of deaths being reported. The mean of f is 1.54, and its standard deviation is 0.19.
Assuming that r0 and f are independent, the variance of r1 is
Var r1 = (E f)2 Var r0 + (E r0)2 Var f +Var r0 Var f.
The authors use only the first term in the variance of r1, ( 1.54 x .18 )2 = 0.282, effectively ignoring its assumed uncertainty, and getting that the standard deviation of r1 is f times that of r0. After including the two other terms, (1.09 x .19)2 and (.18 x .19)2, the resulting variance of r1 is 0.342. The estimated standard deviation is therefore, .34 / .27 = 1.24 times higher, i.e., 24% higher. The confidence interval for the death rate is therefore 24% longer: 1.14 – 2.45 d/Kpy.
[It may be that the appropriate adjustment has been applied to the confidence interval for estimated number of dead – that interval, assuming that the population size of Iraq is known – should be equal to the size of the population of Iraq times the length of the period of time (3.25 years) times the interval for the death rate. It isn’t – it is longer, and seems to roughly fit the corrected confidence interval I calculated above. Whether this is due to adjusting for the uncertainty in the under-reporting factor, or due to some other matter, I do not know.]