Continuing the investigation initiated here, and applying the same notation.

Proposition 2:

For every l, l >= ES, P(S ≥ l) is maximized when q1 = ··· = qn-mz = q = ES / (n – mz), and qn – mz + 1 = ··· = qn = 0, for some mz.

That is, in the terminology of Proposition 1, for all non-trivial cases, mo = 0, and thus the probability of deviation is maximized by some Bernoulli variable (rather than a shifted Bernoulli variable).
Read the rest of this entry »

Let Bi, i = 1, …, n be independent Bernoulli variables with parameters qi, i = 1, …, n, respectively. Let S be their sum. For convenience, assume q1 ≥ q2 ≥ ··· ≥ qn. I wish to bound tightly from above the probability that S is greater or equal to some l, having the bound depend solely on ES = q1 + ··· + qn.

Clearly, if l ≤ ES, then the tightest bound is 1. This is attained by setting q1 = ··· = ql = 1.

This example shows that while the variance of S is maximized by setting qi = ES / n, i = 1, ···, n, at least for some values, l, P(S ≥ l) is maximized by having the Bi not identically distributed.

Proposition 1:

For every l, P(S ≥ l) is maximized when q1 = ··· = qmo = 1, qmo+1 = ··· = qn-mz = q, and qn-mz+1 = ··· = qn = 0, for some mo and mz, and for q = (ES – mo) / (n – mo – mz).

Proof:

Assume that 1 > qi > qj > 0. Let S’ = S – Bi – Bj. Then

P(S ≥ l) = P(S’ ≥ l) + p1 (qi + qj – qi qj) + p2 qi qj,

where p1 = P(S’ = l – 1) and p2 = P(S’ = l – 2). Thus, keeping qi + qj fixed, but varying the proportion between them, P(S ≥ l) is a linear function of qi qj. Unless p1 = p2, P(S ≥ l) will be increased by varying qi and qj – with a maximum either when they are equal or when one of them is zero or one.

Therefore, P(S ≥ l) cannot be at a maximum if there exist 1 > qi > qj > 0, unless p1 = p2. But in that case, the same probability can be achieved by setting the parameter of Bi to be q’i = 0 (if qi + qj < 1) or q’i = 1 (otherwise), and the parameter of Bj to be q’j = qi + qj – q’i. Therefore, in that case, there would exist a set of parameters, q’1, …, q’n, that would achieve that same probability but with fewer parameters that are not equal to zero or one. Thus, in the set of parameter settings maximizing P(S ≥ l), there exists a solution – namely the one which maximizes the number of parameters with extreme values (zeros and ones) – in which there is only one non-extreme value. ¤

The next step is to investigate which specific parameter setting correspond to various combinations of l and ES.

Formal analysis

June 18, 2011

Keynes is rather dismissive of what he calls ‘“mathematical” economics’. The following passage is from chapter 21 of The General Theory:

The object of our analysis is, not to provide a machine, or method of blind manipulation, which will furnish an infallible answer, but to provide ourselves with an organised and orderly method of thinking out particular problems; and, after we have reached a provisional conclusion by isolating the complicating factors one by one, we then have to go back on ourselves and allow, as well as we can, for the probable interactions of the factors amongst themselves. This is the nature of economic thinking. Any other way of applying our formal principles of thought (without which, however, we shall be lost in the wood) will lead us into error. It is a great fault of symbolic pseudo-mathematical methods of formalising a system of economic analysis, such as we shall set down in section vi of this chapter, that they expressly assume strict independence between the factors involved and lose all their cogency and authority if this hypothesis is disallowed; whereas, in ordinary discourse, where we are not blindly manipulating but know all the time what we are doing and what the words mean, we can keep “at the back of our heads” the necessary reserves and qualifications and the adjustments which we shall have to make later on, in a way in which we cannot keep complicated partial differentials “at the back” of several pages of algebra which assume that they all vanish. Too large a proportion of recent “mathematical” economics are mere concoctions, as imprecise as the initial assumptions they rest on, which allow the author to lose sight of the complexities and interdependencies of the real world in a maze of pretentious and unhelpful symbols.

There is much truth in the above, I think, and it is truth that applies not only to “economic thinking” but to any kind of thinking that relies on formalization. Statistical analysis is plagued with this kind of problems. Keynes does lay too much stress on the matter of interaction between factors. The problem with formal methods is not particularly with neglecting various effects – it is that they simply are false in various ways (neglecting various effects is only one of the sources of falsehoods). Informal methods have the same problem, of course – and in addition have problems associated with informality.

Read the rest of this entry »

The following C++ code takes a sequence of n elements, a1, a2, …, an, and outputs a sequence of n – k + 1 elements. The i-th output element is a maximal element in the subsequence ai, ai + 1, …, ai + k – 1. The runtime complexity of the code is O(n).

template <typename T>
void insert(std::list< std::pair<T,unsigned int> > &l, T v)
{
    unsigned int sp = 0;
    while (!l.empty() && l.back().first < v)
    {
        sp++;
        l.pop_back();
    }
    l.push_back(std::pair<T,unsigned int>(v,sp));
}

template <typename T>
void advance(std::list< std::pair<T,unsigned int> > &l)
{
    if (l.front().second > 0)
        l.front().second--;
    else
        l.pop_front();
}

template <typename T>
void max_in_window(T const *in, T *out, size_t n, size_t k)
{
    std::list< std::pair<T,unsigned int> > l;
    unsigned int i;
    for (i = 0; i < k - 1 && i < n; i++)
        insert(l,in[i]);

    for (; i < n; i++)
    {
        insert(l,in[i]);
        out[0] = l.front().first;
        out++;
        advance(l);
    }
}

Interestingly, an algorithm for the median in a sliding window will have runtime of at least O(n log k), since such an algorithm can be used to sort O(n / k) sequences of length O(k) each:

Let a1, a2, …, an be a sequence of numbers in a known interval, say (0, 1). Create a sequence of length 3n – 2 by padding the sequence with a prefix of (n – 1) 0‘s and a suffix of (n – 1) 1‘s. Now execute the sliding window median algorithm with the padded sequence as input, and with k = 2n – 1.

The i-th output element will be the median of a sequence that is made of the entire sequence a1, a2, …, an, n – i zeros, and i – 1 ones. That median is a(i), the i-th smallest elements of the sequence a1, a2, …, an. Thus, the output of the algorithm will be the sorted sequence a(1), a(2), …, a(n).

Induction

August 28, 2010

Let the world of sense data available to an observer be an infinite sequence of bits. At time 0 the observer wishes to verify or refute a hypothesis about the sequence. The hypothesis is a prediction of the entire string from time 0 on. The question of induction is whether the observer can, by verifying that the pattern holds until a certain time n, make it more likely that the pattern holds beyond time n, and possibly holds for ever, without making any a-priori assumptions whatsoever about the sequence.

Clearly, if n is any fixed number then the answer is negative. For any n, and any hypothesis, there is a sequence (actually, an infinite number of sequences) for which the hypothesis holds for bits 1 through n, and fails on bit n + 1. Thus, the observer is led to adopt the hypothesis only to have it fail immediately upon adoption.

If, however, the observer can generate random bits, then some induction is possible. In this case the observer can generate a random number n which is distributed over the natural numbers and which has a probability pk of being equal to k. If the sequence fits the hypothesis to bit m, then the observer will only adopt the hypothesis in the case n < m and, if so, will make at least one successful inductive prediction unless n = m – 1. Thus, the probability of making an immediate error is pm. Given the distribution, pk, k = 1, 2, …, the highest chance for immediate failure occurs when pm maximizes pk over k = 1, 2, …. Thus, the minimax strategy for the probability of immediate failure is to minimize the maximum pk. Under the constraint that the expected value for n should be no more than a certain value, b, the minimax strategy for the observer is to sample n uniformly over the range 1, 2, …, 2b. The probability of adoption and immediate failure in this case is no more than 1 / 2b, regardless of what the sequence is.

Similarly, the chance of hypothesis adoption and failure within d predictions is d / 2b. Being able to predict with any confidence that a certain hypothesis will hold forever is clearly impossible no matter how many trials one is allowed to make before adopting the hypothesis.

Similar analysis holds, mutatis mutandis, if the sense-data is assumed to be a observed continuously in time. Observing the hypothesis holding for a certain random period of time enables the observer to predict with some confidence that the hypothesis would continue to hold for some comparable length of time beyond that observation period.

Steven Levitt has risen to stardom by riding on the overblown rhetoric resting on the overblown claims of Freakonomics. It is, unfortunately, in the nature of popular books that they oversimplify and over-claim. Academic literature is supposed to be different: rigorous, cautious and, of course, peer-reviewed for accuracy.

Of course, if all those attributes really applied, a career like that of Levitt would have been impossible, since the econometric methodology he employs is far too weak to be able to produce with any credibility the kind of results Levitt is aiming at. It is clear, therefore, that within the community within which Levitt works, some standards of critical thought have been suspended. However, even when credulity is being stretched and poorly supported statements are taken as proven, one may still hope for the superficial ground rules to apply. Specifically, for example, one hopes that when previous research is cited and summarized the findings of the research are fairly represented.

Read the rest of this entry »

Note: This post deals with the proportion of total US income taken up by the households controlling the most income. The percentiles of the income distribution are given in a different post: Household income distribution, 2007.

The chart below is based on data of Saez and Piketty (Table A3), who rely on IRS publications.


Read the rest of this entry »

A paper of mine was published in the open-access Electronic Journal of Statistics. It proposes a method of constructing lower confidence bounds for positive random variables that are guaranteed to have the nominal coverage probability for any distribution, as long as the sample points are i.i.d., from a distribution over the non-negative reals.

In the paper, the method is applied to analyze (a version of) the data of the second Lancet study of mortality in post-invasion Iraq.