IFHS – Population/sample size correlation
January 18, 2008
According to the description of the sampling method of IFHS (both in the paper itself and in the supplementary material), 10 households were surveyed in each cluster, and there were (with few exceptions) 3 x 18 = 54 clusters per governorate. In such a set-up there should be no correlation between the number of people surveyed in each governorate and the size of the population in the governorate.
The chart below was generated using the data in table 2 of the supplementary material of the paper. Each of the 18 point corresponds to a governorate. The x-axis value is the population size in the governorate (calculated as the mean of the 5 values, for 5 different time points, given in the table). The y-axis value is the average sample per cluster in the governorate. The total sample size given in the table was divided by the actual number of clusters visited (54 for most, 65 for Baghdad, 60 for Nineveh, 37 for Al-Anbar and 53 for Wasit). A strong correlation between the two is evident. The correlation factor is 0.94. After removing the two outliers – Baghdad and Nineveh – the correlation factor is 0.72 (p-value 3×10^-5).
Assuming that the description of the sampling method is correct, then it seems that the only way such a correlation could show up is if the size of the population in the a governorate is strongly correlated with the average household size in the the governorate. This is possible but seems unlikely a-priori. Another surprising finding in such a case would be the sheer range of household size variation – ranging from less 3.2 people per household in 3 of the smallest governorates to almost 20 people per household in Baghdad.
The possibility that the description of the sampling method is incorrect presents itself strongly.