September | 2011 | Pro Bono Statistics

Deviations from the mean of a sum of independent, non-identically-distributed Bernoulli variables

September 29, 2011

Let B_i, i = 1, …, n be independent Bernoulli variables with parameters q_i, i = 1, …, n, respectively. Let S be their sum. For convenience, assume q₁ ≥ q₂ ≥ ··· ≥ q_n. I wish to bound tightly from above the probability that S is greater or equal to some l, having the bound depend solely on ES = q₁ + ··· + q_n.

Clearly, if l ≤ ES, then the tightest bound is 1. This is attained by setting q₁ = ··· = q_l = 1.

This example shows that while the variance of S is maximized by setting q_i = ES / n, i = 1, ···, n, at least for some values, l, P(S ≥ l) is maximized by having the B_i not identically distributed.

Proposition 1:

For every l, P(S ≥ l) is maximized when q₁ = ··· = q_{m_o} = 1, q_{m_o+1} = ··· = q_{n-m_z} = q, and q_{n-m_z+1} = ··· = q_n = 0, for some m_o and m_z, and for q = (ES – m_o) / (n – m_o – m_z).

Proof:

Assume that 1 > q_i > q_j > 0. Let S’ = S – B_i – B_j. Then

P(S ≥ l) = P(S’ ≥ l) + p₁ (q_i + q_j – q_i q_j) + p₂ q_i q_j,

where p₁ = P(S’ = l – 1) and p₂ = P(S’ = l – 2). Thus, keeping q_i + q_j fixed, but varying the proportion between them, P(S ≥ l) is a linear function of q_i q_j. Unless p₁ = p₂, P(S ≥ l) will be increased by varying q_i and q_j – with a maximum either when they are equal or when one of them is zero or one.

Therefore, P(S ≥ l) cannot be at a maximum if there exist 1 > q_i > q_j > 0, unless p₁ = p₂. But in that case, the same probability can be achieved by setting the parameter of B_i to be q’_i = 0 (if q_i + q_j < 1) or q’_i = 1 (otherwise), and the parameter of B_j to be q’_j = q_i + q_j – q’_i. Therefore, in that case, there would exist a set of parameters, q’₁, …, q’_n, that would achieve that same probability but with fewer parameters that are not equal to zero or one. Thus, in the set of parameter settings maximizing P(S ≥ l), there exists a solution – namely the one which maximizes the number of parameters with extreme values (zeros and ones) – in which there is only one non-extreme value. ¤

The next step is to investigate which specific parameter setting correspond to various combinations of l and ES.

Posted in statistics | 9 Comments »

Pro Bono Statistics

Deviations from the mean of a sum of independent, non-identically-distributed Bernoulli variables

September 29, 2011

Proposition 1:

Proof:

Pages

Blogroll

Categories

RSS feed

Archives