## Deviations from the mean of a sum of independent, non-identically-distributed Bernoulli variables

### September 29, 2011

Let *B _{i}, i = 1, …, n* be independent Bernoulli variables with parameters

*q*, respectively. Let

_{i}, i = 1, …, n*S*be their sum. For convenience, assume

*q*. I wish to bound tightly from above the probability that

_{1}≥ q_{2}≥ ··· ≥ q_{n}*S*is greater or equal to some

*l*, having the bound depend solely on

*.*

**E**S = q_{1}+ ··· + q_{n}Clearly, if *l ≤ ES*, then the tightest bound is *1*. This is attained by setting *q _{1} = ··· = q_{l} = 1*.

This example shows that while the variance of *S* is maximized by setting *q _{i} = ES / n, i = 1, ···, n*, at least for some values,

*l*,

*P(S ≥ l)*is maximized by having the

*B*not identically distributed.

_{i}##### Proposition 1:

For every *l*, *P(S ≥ l)* is maximized when *q _{1} = ··· = q_{mo} = 1, q_{mo+1} = ··· = q_{n-mz} = q, and q_{n-mz+1} = ··· = q_{n} = 0*, for some

*m*and

_{o}*m*, and for

_{z}*q = (*.

**E**S – m_{o}) / (n – m_{o}– m_{z})##### Proof:

Assume that *1 > q _{i} > q_{j} > 0*. Let

*S’ = S – B*. Then

_{i}– B_{j}*P(S ≥ l) = P(S’ ≥ l) + p _{1} (q_{i} + q_{j} – q_{i} q_{j}) + p_{2} q_{i} q_{j}*,

where *p _{1} = P(S’ = l – 1)* and

*p*. Thus, keeping

_{2}= P(S’ = l – 2)*q*fixed, but varying the proportion between them,

_{i}+ q_{j}*P(S ≥ l)*is a linear function of

*q*. Unless

_{i}q_{j}*p*,

_{1}= p_{2}*P(S ≥ l)*will be increased by varying

*q*– with a maximum either when they are equal or when one of them is zero or one.

_{i}and q_{j}Therefore, *P(S ≥ l)* cannot be at a maximum if there exist *1 > q _{i} > q_{j} > 0*, unless

*p*. But in that case, the same probability can be achieved by setting the parameter of

_{1}= p_{2}*B*to be

_{i}*q’*(if

_{i}= 0*q*(otherwise), and the parameter of

_{i}+ q_{j}< 1) or q’_{i}= 1*B*to be

_{j}*q’*. Therefore, in that case, there would exist a set of parameters,

_{j}= q_{i}+ q_{j}– q’_{i}*q’*, that would achieve that same probability but with fewer parameters that are not equal to zero or one. Thus, in the set of parameter settings maximizing

_{1}, …, q’_{n}*P(S ≥ l)*, there exists a solution – namely the one which maximizes the number of parameters with extreme values (zeros and ones) – in which there is only one non-extreme value. ¤

The next step is to investigate which specific parameter setting correspond to various combinations of *l* and * ES*.

October 4, 2011 at 4:24 am

Interesting post, Yoram, looking forward to the continuation. Was this inspired by a real-life problem?

October 7, 2011 at 11:42 am

Hi Arkadas,

Thanks for your interest.

This is connected in some way to my 2008 EJS paper, Rigorous LCB for the expectation of a positive r.v..

In that paper I assumed that a sample from a set of IID r.v.’s is available. I have been trying (on-and-off) to extend the treatment to handling a sample from a set of independent non-ID r.v.’s. The real life application would be similar to the one in the paper – survey sampling – but would allow handling stratified sampling.

October 9, 2011 at 3:29 pm

Thanks, Yoram. I’ve read your paper (skipping the proofs), and learned quite a bit. It looks like a fundamental contribution—I am surprised that there are still research opportunities on such basic problems. This also got me curious about the original studies on Iraq war casualties, I’ve just checked the Wikipedia page, which was huge.

October 13, 2011 at 8:51 am

I am also rather perplexed with the fact that basic questions are still open while the journals are filled with esoteric papers.

I was half expecting the editor of EJS to return my paper saying that I was covering some well-known ground, but clearly he was unfamiliar with such prior work. Also, I guess Mark var der Laan – a well respected Berkeley professor – would not have put forward his rather absurd argument in the note I refer to in paper had he been aware that a rigorous finite sample LCB can be constructed.

I would still not be that surprised if it turns out that a similar result was published before, but even if this is the case, the mere fact that two active academic statisticians were not aware of such existing work is puzzling.

The area of independent but not identically distributed variables also seems to be under-researched, I think. It is a natural extension of the classical IID setup. Again, maybe I am just not aware of existing literature, but even if this is the case, it is surprising that it is not part of the standard textbook material.

October 29, 2011 at 12:49 pm

[…] the investigation initiated here, and applying the same […]

February 18, 2014 at 8:42 pm

I just realized, to my surprise, that the argument made here applies much more widely than to the indicator function. It applies, in fact, to any function of a sum of independent Bernoulli r.v.’s.

That is, the expectation of the function evaluated at the sum of independent Bernoulli r.v.’s (whose sum of expectations is given) is maximized when the sum is a shifted Binomial or a shifted Poisson.

December 15, 2014 at 6:27 pm

I have just become aware of some relevant prior work. Y.H. Wang’s On the number of successes in independent trials (1993) has various results about Poisson Binomial distributions. I have not read this paper yet, but it does note that “recent” study of Poisson Binomial distributions was initiated by Wassily Hoeffding in his 1956 paper On the Distribution of the Number of Successes in Independent Trials. It appears that my results here largely reproduce Hoeffding’s results.

Such a fundamental result is not new then. The question of why this is not part of the standard probability curriculum remains.

December 15, 2014 at 6:30 pm

Another paper cited by Wang, a 1965 paper by Samuels, On the number of successes in independent trials, is also relevant.

December 15, 2014 at 6:58 pm

The results in Hoeffding’s, Samuels’s and Wang’s papers not withstanding it still appears to me that there are a lot of questions about PB distributions that have not been discussed.

Most prominent I think is the question of characterizing the family of PB distributions. The family of PB distributions with the range {0, 1, …, n} covers a non-zero-measure subset of the [0, 1]^n parameter space of the general distribution over that range. How does this subset look like?