Steven Levitt misrepresenting his source
December 21, 2008
Steven Levitt has risen to stardom by riding on the overblown rhetoric resting on the overblown claims of Freakonomics. It is, unfortunately, in the nature of popular books that they oversimplify and over-claim. Academic literature is supposed to be different: rigorous, cautious and, of course, peer-reviewed for accuracy.
Of course, if all those attributes really applied, a career like that of Levitt would have been impossible, since the econometric methodology he employs is far too weak to be able to produce with any credibility the kind of results Levitt is aiming at. It is clear, therefore, that within the community within which Levitt works, some standards of critical thought have been suspended. However, even when credulity is being stretched and poorly supported statements are taken as proven, one may still hope for the superficial ground rules to apply. Specifically, for example, one hopes that when previous research is cited and summarized the findings of the research are fairly represented.
Share of income of top percentiles of U.S. households
November 28, 2008
Note: This post deals with the proportion of total US income taken up by the households controlling the most income. The percentiles of the income distribution are given in a different post: Household income distribution, 2007.
The chart below is based on data of Saez and Piketty (Table A3), who rely on IRS publications.
The year 2006, the latest year for which data is available, has seen concentration of income by the top percentiles which has not been observed since the late 1920’s. Following a three-decade long process of increasing concentration of income by top percentiles, in 2006, a tenth of the U.S. population controlled half the national income, a hundredth of the population controlled a quarter of the income, a thousandth of the population controlled an eighth of the income and one ten-thousandth of the population controlled one-sixteenth of the income. Thus, the top x percentile group controls about 2 times more income than the top x/10 percentile group.
It is important, however, to note that even at the years of least concentration – the late 1950’s to the late 1970’s – the top percentiles controlled very disproportionate shares of the national income. The top 10% controlled at least one third of the income throughout the 20’s century. During those “golden years” of income dispersal, the top x percentile group controlled about 3 times more income than the top x/10 percentile group. The change in the share between the “golden years” and today is therefore not very significant at the top 10% level, but, by the repeated multiplication by a factor of 3/2, it is much more significant at the top 0.01 percentile level (these households now control (1/2)4 = 1/16 of the income, as opposed to controlling a mere (1/3)4 = 1/81 of the national income in, say, 1960).
One interesting implication of the fact that half the national income goes to the top 10% of households: if the national income were re-distributed so that each household received an equal share of the national income, the income available to each household would then be equal to the income of the household which currently is at the 90% percentile of the income distribution. Thus each household in the U.S. would, in such a situation, be making over $100,000 annually.
Update (22-Oct-2009): Added data for top 400 households for the years 1992-2006, made available by the IRS. The top 400 households (out of about 150 million) represent the top 0.0002667% of households. This tiny fraction of the households controlled in 2006 over 1.3% of the entire national income. The pattern of income share of top income groups described above holds tolerably well for this group as well: (1/2) to the power of log10(400 / 150 million) = 2.1%.
Rigorous LCB for the expectation of a positive r.v.
October 27, 2008
A paper of mine was published in the open-access Electronic Journal of Statistics. It proposes a method of constructing lower confidence bounds for positive random variables that are guaranteed to have the nominal coverage probability for any distribution, as long as the sample points are i.i.d., from a distribution over the non-negative reals.
In the paper, the method is applied to analyze (a version of) the data of the second Lancet study of mortality in post-invasion Iraq.
Dishwashing: man vs. machine
September 24, 2008
When searching online for information comparing manual dishwashing to dishwashing machines, a University of Bonn study is the most prominent point of empirical research that shows up (e.g., 1, 2, 3, 4). This study is usually interpreted as showing that dishwashers are more resource efficient than hand washing – using less work time, less energy and less water to wash the same amount of dishes.
Some commenters in the Treehugger post linked above showed healthy skepticism of this all-too-convenient claim. Fortunately reports from the University of Bonn study are available online (1, 2, 3) and the researchers were kind enough to include some data in those reports, making it possible to examine the results rather than rely on media reports alone. I thus decided to have a methodical look at the study – this post presents my conclusions on this matter.
Abstract
Multiple weaknesses in the experimental setup make the interpretation of the study difficult. The data analysis carried out by the researchers seems tendentious. Claims that the study shows that using a dishwashing machine saves substantial amounts of energy, water and time as compared to hand washing are highly dubious. According to the study’s own findings, the most efficient handwashers used far less energy (actually, none, since these washers used no hot water) and about the same amount of water as the most efficient machines. Using no hot water had no negative impact on the cleanliness of the washed dishes.
Read the rest of this entry »
Effective sample size
July 16, 2008
The notion of “effective sample size” is useful when comparing the information gathered using a certain sampling method to information gathered (or that would be gathered) with a different – reference – sampling method when applied to the same population. Given a known level of uncertainty of an estimate – the one achieved with the sampling method and sample size actually used, the “effective sample size” is the size of the sample that would be needed to achieve the same level of uncertainty if the reference sampling method were used instead.
The MLE conjecture, the IMS bulletin and Science
July 8, 2008
The latest issue of the IMS bulletin contains a letter in which a reader, Anirban DasGupta from Purdue University, lays out his thoughts regarding Ning-Zhong Shi’s MLE conjecture. Evidently, DasGupta’s comments were considered of higher relevance than my own letter to the bulletin’s editors regarding the conjecture, a letter which described the same counter-example to the conjecture that appears in the post linked above.
The IMS Bulletin is usually not a technical publication. It usually carries announcements about upcoming conferences, obituaries, stories about award winners, job ads and columns.
The latest issue of the IMS Bulletin, however, has an unusual item on page 4 in which Ning-Zhong Shi of the School of Mathematics and Statistics, Northeast Normal University, P.R. China lays out “A conjecture on Maximum Likelihood Estimation”. Shi conjectures that the MLE has the finite sample property of having expected squared-error monotonically decreasing in the number of samples n, i.e., MSE( θn+1 ) ≤ MSE( θn ), where θn is the MLE calculated with n i.i.d. samples, and MSE( θn ) = E [ (θn − θ)2 ].
This is a rather bold conjecture, since there is very little established regarding finite sample properties of MLEs. Shi notes that the conjecture is true in the special case in which the MLE is a mean of a sample of i.i.d. variables – such as when the variables are normal or Bernoulli variables parameterized by the unknown mean. It seems that despite applying the qualifier “under some regularity conditions”, Shi expects the result to hold on a much wider set of cases.
He is, of course, wrong.
IMS Bulletin on refereeing
March 24, 2008
The March 2008 IMS (Institute of Mathematical Statistics) Bulletin issue has a special section discussing refereeing. The bulletin editor, Xuming He, introduces the matter on the front page of the issue while four present and past editors of statistics journals – John Marden, Michael Stein, Xiao-Li Meng and Rick Durrett – present their ideas about refereeing on the inside pages.
The discussion takes a predictable path – the writers describe and beseech what they perceive as good behavior on the side of referees (mostly focusing on promptness). Indeed, the choice of writers – established figures in the field – made it unlikely a-priori that a radical examination of the matter would be undertaken. The role of refereeing as gate-keeping is never questioned and the question of what is the objective of refereeing is not even raised by most writers (Marden is the exception here, see below). In view of the absence of the question of the objective, it is impossible to address fundamental questions – does refereeing serve the public (the public of researchers and the public at large) and whether there could be publication selection systems that are superior to refereeing – and refereeing, essentially in its present form and function, is presented as an immutable natural phenomenon.
IFHS – “Violent deaths”
January 27, 2008
The main discrepancy between findings in the IFHS paper and those of Burnham et al. is not in the total excess deaths, but in the specific category “violent deaths”. It is therefore of interest to examine whether the classification methods used in those papers to assign deaths to the “violent” category are identical, and whether any differences in the classifications could account for some of the different findings. I notice two points on which the papers’ methodologies of classification differ: One is that Burnham et al. examined death certificates, while IFHS did not. A second difference is that they use different categories for injuries. Burnham et al. use two “accident” categories. One of those is included in the “non-violent” section, the other in the “violent” section. IFHS has no “violent accident” category, and has two categories, “road accidents” and “unintentional injuries”, counting injuries within the “non-violent” classification.

