Frequent Links
Binomial distribution
Probability mass function Probability mass function for the binomial distribution  
Cumulative distribution function Cumulative distribution function for the binomial distribution  
Notation  B(n, p) 

Parameters 
n ∈ N_{0} — number of trials p ∈ [0,1] — success probability in each trial 
Support  k ∈ { 0, …, n } — number of successes 
pmf  <math>\textstyle {n \choose k}\, p^k (1p)^{nk}</math> 
CDF  <math>\textstyle I_{1p}(n  k, 1 + k)</math> 
Mean  <math>np</math> 
Median  <math>\lfloor np \rfloor</math> or <math>\lceil np \rceil</math> 
Mode  <math>\lfloor (n + 1)p \rfloor</math> or <math>\lfloor (n + 1)p \rfloor  1</math> 
Variance  <math>np(1  p)</math> 
Skewness  <math>\frac{12p}{\sqrt{np(1p) 
 kurtosis = <math>\frac{16p(1p)}{np(1p)}</math>  entropy = <math>\frac12 \log_2 \big( 2\pi e\, np(1p) \big) + O \left( \frac{1}{n} \right)</math>
in shannons. For nats, use the natural log, and omit the factor of <math>e</math> in the log.  mgf = <math>(1p + pe^t)^n \!</math>  char = <math>(1p + pe^{it})^n \!</math>  pgf = <math>G(z) = \left[(1p) + pz\right]^n.</math>  fisher = <math> g_n(p) = \frac{n}{p(1p)} </math>
(for fixed <math>n</math>)
}}
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. A success/failure experiment is also called a Bernoulli experiment or Bernoulli trial; when n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
The binomial distribution is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. If the sampling is carried out without replacement, the draws are not independent and so the resulting distribution is a hypergeometric distribution, not a binomial one. However, for N much larger than n, the binomial distribution is a good approximation, and widely used.
Contents
Specification
Probability mass function
In general, if the random variable X follows the binomial distribution with parameters n and p, we write X ~ B(n, p). The probability of getting exactly k successes in n trials is given by the probability mass function:
 <math> f(k;n,p) = \Pr(X = k) = {n\choose k}p^k(1p)^{nk}</math>
for k = 0, 1, 2, ..., n, where
 <math>{n\choose k}=\frac{n!}{k!(nk)!}</math>
is the binomial coefficient, hence the name of the distribution. The formula can be understood as follows: we want exactly k successes (p^{k}) and n − k failures (1 − p)^{n − k}. However, the k successes can occur anywhere among the n trials, and there are <math>{n\choose k}</math> different ways of distributing k successes in a sequence of n trials.
In creating reference tables for binomial distribution probability, usually the table is filled in up to n/2 values. This is because for k > n/2, the probability can be calculated by its complement as
 <math>f(k,n,p)=f(nk,n,1p). </math>
Looking at the expression ƒ(k, n, p) as a function of k, there is a k value that maximizes it. This k value can be found by calculating
 <math> \frac{f(k+1,n,p)}{f(k,n,p)}=\frac{(nk)p}{(k+1)(1p)} </math>
and comparing it to 1. There is always an integer M that satisfies
 <math>(n+1)p1 \leq M < (n+1)p.</math>
ƒ(k, n, p) is monotone increasing for k < M and monotone decreasing for k > M, with the exception of the case where (n + 1)p is an integer. In this case, there are two values for which ƒ is maximal: (n + 1)p and (n + 1)p − 1. M is the most probable (most likely) outcome of the Bernoulli trials and is called the mode. Note that the probability of it occurring can be fairly small.
<math> \left\{p (nk) \text{Prob}(k)+(k+1) (p1)
\text{Prob}(k+1)=0,\text{Prob}(0)=( 1p)^n\right\}</math>
Cumulative distribution function
The cumulative distribution function can be expressed as:
 <math>F(k;n,p) = \Pr(X \le k) = \sum_{i=0}^{\lfloor k \rfloor} {n\choose i}p^i(1p)^{ni}</math>
where <math>\scriptstyle \lfloor k\rfloor\,</math> is the "floor" under k, i.e. the greatest integer less than or equal to k.
It can also be represented in terms of the regularized incomplete beta function, as follows:^{[1]}
 <math>\begin{align}
F(k;n,p) & = \Pr(X \le k) \\ &= I_{1p}(nk, k+1) \\ & = (nk) {n \choose k} \int_0^{1p} t^{nk1} (1t)^k \, dt. \end{align}</math>
Some closedform bounds for the cumulative distribution function are given below.
Example
Suppose a biased coin comes up heads with probability 0.3 when tossed. What is the probability of achieving 0, 1,..., 6 heads after six tosses?
 <math>\Pr(0\text{ heads}) = f(0) = \Pr(X = 0) = {6\choose 0}0.3^0 (10.3)^{60} \approx 0.1176 </math>
 <math>\Pr(1\text{ heads}) = f(1) = \Pr(X = 1) = {6\choose 1}0.3^1 (10.3)^{61} \approx 0.3025 </math>
 <math>\Pr(2\text{ heads}) = f(2) = \Pr(X = 2) = {6\choose 2}0.3^2 (10.3)^{62} \approx 0.3241 </math>
 <math>\Pr(3\text{ heads}) = f(3) = \Pr(X = 3) = {6\choose 3}0.3^3 (10.3)^{63} \approx 0.1852</math>
 <math>\Pr(4\text{ heads}) = f(4) = \Pr(X = 4) = {6\choose 4}0.3^4 (10.3)^{64} \approx 0.0595</math>
 <math>\Pr(5\text{ heads}) = f(5) = \Pr(X = 5) = {6\choose 5}0.3^5 (10.3)^{65} \approx 0.0102 </math>
 <math>\Pr(6\text{ heads}) = f(6) = \Pr(X = 6) = {6\choose 6}0.3^6 (10.3)^{66} \approx 0.0007</math>^{[2]}
Mean and variance
If X ~ B(n, p), that is, X is a binomially distributed random variable, n being the total number of experiments and p the probability of each experiment yielding a successful result, then the expected value of X is:^{[3]}
 <math> \operatorname{E}[X] = np , </math>
(For example, if n=100, and p=1/4, then the average number of successful results will be 25)
The variance is:
 <math> \operatorname{Var}[X] = np(1  p).</math>
Mode and median
Usually the mode of a binomial B(n, p) distribution is equal to <math>\lfloor (n+1)p\rfloor</math>, where <math>\lfloor\cdot\rfloor</math> is the floor function. However when (n + 1)p is an integer and p is neither 0 nor 1, then the distribution has two modes: (n + 1)p and (n + 1)p − 1. When p is equal to 0 or 1, the mode will be 0 and n correspondingly. These cases can be summarized as follows:
 <math>\text{mode} =
\begin{cases} \lfloor (n+1)\,p\rfloor & \text{if }(n+1)p\text{ is 0 or a noninteger}, \\ (n+1)\,p\ \text{ and }\ (n+1)\,p  1 &\text{if }(n+1)p\in\{1,\dots,n\}, \\ n & \text{if }(n+1)p = n + 1. \end{cases}</math>
In general, there is no single formula to find the median for a binomial distribution, and it may even be nonunique. However several special results have been established:
 If np is an integer, then the mean, median, and mode coincide and equal np.^{[4]}^{[5]}
 Any median m must lie within the interval ⌊np⌋ ≤ m ≤ ⌈np⌉.^{[6]}
 A median m cannot lie too far away from the mean: m − np ≤ min{ ln 2, max{p, 1 − p} }.^{[7]}
 The median is unique and equal to m = round(np) in cases when either p ≤ 1 − ln 2 or p ≥ ln 2 or m − np ≤ min{p, 1 − p} (except for the case when p = ½ and n is odd).^{[6]}^{[7]}
 When p = 1/2 and n is odd, any number m in the interval ½(n − 1) ≤ m ≤ ½(n + 1) is a median of the binomial distribution. If p = 1/2 and n is even, then m = n/2 is the unique median.
Covariance between two binomials
If two binomially distributed random variables X and Y are observed together, estimating their covariance can be useful. Using the definition of covariance, in the case n = 1 (thus being Bernoulli trials) we have
 <math>\operatorname{Cov}(X, Y) = \operatorname{E}(XY)  \mu_X \mu_Y.</math>
The first term is nonzero only when both X and Y are one, and μ_{X} and μ_{Y} are equal to the two probabilities. Defining p_{B} as the probability of both happening at the same time, this gives
 <math>\operatorname{Cov}(X, Y) = p_B  p_X p_Y,</math>
and for n independent pairwise trials
 <math>\operatorname{Cov}(X, Y)_n = n ( p_B  p_X p_Y ).</math>
If X and Y are the same variable, this reduces to the variance formula given above.
Related distributions
Sums of binomials
If X ~ B(n, p) and Y ~ B(m, p) are independent binomial variables with the same probability p, then X + Y is again a binomial variable; its distribution is^{[citation needed]}
 <math>X+Y \sim B(n+m, p).\,</math> However, if X and Y do not have the same probability p, then the variance of the sum will be smaller than the variance of a binomial variable distributed as <math>B(n+m, \bar{p}).\,</math>
Conditional binomials
If X ~ B(n, p) and, conditional on X, Y ~ B(X, q), then Y is a simple binomial variable with distribution^{[citation needed]}
 <math>Y \sim B(n, pq).</math>
For example imagine throwing n balls to a basket U_{X} and taking the balls that hit and throwing them to another basket U_{Y}. If p is the probability to hit U_{X} then X ~ B(n, p) is the number of balls that hit U_{X}. If q is the probability to hit U_{Y} then the number of balls that hit U_{Y} is Y ~ B(X, q) and therefore Y ~ B(n, pq).
Bernoulli distribution
The Bernoulli distribution is a special case of the binomial distribution, where n = 1. Symbolically, X ~ B(1, p) has the same meaning as X ~ Bern(p). Conversely, any binomial distribution, B(n, p), is the distribution of the sum of n Bernoulli trials, Bern(p), each with the same probability p.^{[citation needed]}
Poisson binomial distribution
The binomial distribution is a special case of the Poisson binomial distribution, which is a sum of n independent nonidentical Bernoulli trials Bern(p_{i}).^{[citation needed]} If X has the Poisson binomial distribution with p_{1} = … = p_{n} =p then X ~ B(n, p).
Normal approximation
If n is large enough, then the skew of the distribution is not too great. In this case a reasonable approximation to B(n, p) is given by the normal distribution
 <math> \mathcal{N}(np,\, np(1p)),</math>
and this basic approximation can be improved in a simple way by using a suitable continuity correction. The basic approximation generally improves as n increases (at least 20) and is better when p is not near to 0 or 1.^{[8]} Various rules of thumb may be used to decide whether n is large enough, and p is far enough from the extremes of zero or one:
 One rule is that both x=np and n(1 − p) must be greater than 5. However, the specific number varies from source to source, and depends on how good an approximation one wants; some sources give 10 which gives virtually the same results as the following rule for large n until n is very large (ex: x=11, n=7752).
 A second rule^{[8]} is that for n > 5 the normal approximation is adequate if
 <math>\left  \left (\frac{1}{\sqrt{n}} \right ) \left (\sqrt{\frac{1p}{p}}\sqrt{\frac{p}{1p}} \right ) \right <0.3</math>
 Another commonly used rule holds that the normal approximation is appropriate only if everything within 3 standard deviations of its mean is within the range of possible values,^{[citation needed]} that is if
 <math>\mu \pm 3 \sigma = np \pm 3 \sqrt{np(1p)} \in [0,n].</math>
The following is an example of applying a continuity correction. Suppose one wishes to calculate Pr(X ≤ 8) for a binomial random variable X. If Y has a distribution given by the normal approximation, then Pr(X ≤ 8) is approximated by Pr(Y ≤ 8.5). The addition of 0.5 is the continuity correction; the uncorrected normal approximation gives considerably less accurate results.
This approximation, known as de Moivre–Laplace theorem, is a huge timesaver when undertaking calculations by hand (exact calculations with large n are very onerous); historically, it was the first use of the normal distribution, introduced in Abraham de Moivre's book The Doctrine of Chances in 1738. Nowadays, it can be seen as a consequence of the central limit theorem since B(n, p) is a sum of n independent, identically distributed Bernoulli variables with parameter p. This fact is the basis of a hypothesis test, a "proportion ztest", for the value of p using x/n, the sample proportion and estimator of p, in a common test statistic.^{[9]}
For example, suppose one randomly samples n people out of a large population and ask them whether they agree with a certain statement. The proportion of people who agree will of course depend on the sample. If groups of n people were sampled repeatedly and truly randomly, the proportions would follow an approximate normal distribution with mean equal to the true proportion p of agreement in the population and with standard deviation σ = (p(1 − p)/n)^{1/2}.
Poisson approximation
The binomial distribution converges towards the Poisson distribution as the number of trials goes to infinity while the product np remains fixed. Therefore the Poisson distribution with parameter λ = np can be used as an approximation to B(n, p) of the binomial distribution if n is sufficiently large and p is sufficiently small. According to two rules of thumb, this approximation is good if n ≥ 20 and p ≤ 0.05, or if n ≥ 100 and np ≤ 10.^{[10]}
Limiting distributions
 Poisson limit theorem: As n approaches ∞ and p approaches 0 while np remains fixed at λ > 0 or at least np approaches λ > 0, then the Binomial(n, p) distribution approaches the Poisson distribution with expected value λ.^{[10]}
 de Moivre–Laplace theorem: As n approaches ∞ while p remains fixed, the distribution of
 <math>\frac{Xnp}{\sqrt{np(1p)}}</math>
 approaches the normal distribution with expected value 0 and variance 1.^{[citation needed]} This result is sometimes loosely stated by saying that the distribution of X is asymptotically normal with expected value np and variance np(1 − p). This result is a specific case of the central limit theorem.
Beta distribution
Beta distributions provide a family of conjugate prior probability distributions for binomial distributions in Bayesian inference. The domain of the beta distribution can be viewed as a probability, and in fact the beta distribution is often used to describe the distribution of a probability value p:^{[11]}
 <math>P(p;\alpha,\beta) = \frac{p^{\alpha1}(1p)^{\beta1}}{\mathrm{B}(\alpha,\beta)}</math>.
Confidence intervals
Even for quite large values of n, the actual distribution of the mean is significantly nonnormal.^{[12]} Because of this problem several methods to estimate confidence intervals have been proposed.
Let n_{1} be the number of successes out of n, the total number of trials, and let
 <math> \hat{p} = \frac{n_1}{n}</math>
be the proportion of successes. Let z_{α/2} be the 100(1 − α/2)th percentile of the standard normal distribution.
 Wald method
 <math> \hat{p} \pm z_{\frac{\alpha}{2}} \sqrt{ \frac{ \hat{p} ( 1 \hat{p} )}{ n } } .</math>
 A continuity correction of 0.5/n may be added.^{[clarification needed]}
 AgrestiCoull method^{[13]}
 <math> \tilde{p} \pm z_{\frac{\alpha}{2}} \sqrt{ \frac{ \tilde{p} ( 1  \tilde{p} )}{ n + z_{\frac{\alpha}{2}}^2 } } .</math>
 Here the estimate of p is modified to
 <math> \tilde{p}= \frac{ n_1 + \frac{1}{2} z_{\frac{\alpha}{2}}^2}{ n + z_{\frac{\alpha}{2}}^2 } </math>
 ArcSine method^{[14]}
 <math>\sin^2 \left (\arcsin \left ( \sqrt{ \hat{p} } \right ) \pm \frac{ z }{ 2 \sqrt{ n } } \right ) </math>
 Wilson (score) method^{[15]}
 <math> \frac{\hat{p} + \frac{1}{2n} z_{1\frac{\alpha}{2}}^2 \pm \frac{1}{2n} z_{1\frac{\alpha}{2}} \sqrt{4n\hat{p}(1  \hat{p})+ z_{1\frac{\alpha}{2}}^2}} {1+ \frac{1}{n} z_{1\frac{\alpha}{2}}^2}.</math>
The exact (ClopperPearson) method is the most conservative.^{[12]} The Wald method although commonly recommended in the text books is the most biased.^{[clarification needed]}
Generating binomial random variates
Methods for random number generation where the marginal distribution is a binomial distribution are wellestablished.^{[16]}^{[17]}
One way to generate random samples from a binomial distribution is to use an inversion algorithm. To do so, one must calculate the probability that P(X=k) for all values k from 0 through n. (These probabilities should sum to a value close to one, in order to encompass the entire sample space.) Then by using a Linear congruential generator to generate samples uniform between 0 and 1, one can transform the calculated samples U[0,1] into discrete numbers by using the probabilities calculated in step one.
Tail Bounds
For k ≤ np, upper bounds for the lower tail of the distribution function can be derived. In particular, Hoeffding's inequality yields the bound
 <math> F(k;n,p) \leq \exp\left(2 \frac{(npk)^2}{n}\right), \!</math>
and Chernoff's inequality can be used to derive the bound
 <math> F(k;n,p) \leq \exp\left(\frac{1}{2\,p} \frac{(npk)^2}{n}\right). \!</math>
Moreover, these bounds are reasonably tight when p = 1/2, since the following expression holds for all k ≥ 3n/8^{[18]}
 <math> F(k;n,\tfrac{1}{2}) \geq \frac{1}{15} \exp\left( \frac{16 (\frac{n}{2}  k)^2}{n}\right). \!</math>
However, the bounds do not work well for extreme values of p. In particular, as p <math>\rightarrow</math> 1, value F(k;n,p) goes to zero (for fixed k, n with k<n) while the upper bound above goes to a positive constant. In this case a better bound is given by ^{[19]}
 <math> F(k;n,p) \leq \exp\left(nD\left(\frac{k}{n}\left\rightp\right)\right) \quad\quad\mbox{if }0<\frac{k}{n}<p\!</math>
where D(a p) is the relative entropy between an acoin and a pcoin (i.e. between the Bernoulli(a) and Bernoulli(p) distribution):
 <math> D(ap)=(a)\log\frac{a}{p}+(1a)\log\frac{1a}{1p}. \!</math>
Asymptotically, this bound is reasonably tight; see ^{[19]} for details. An equivalent formulation of the bound is
 <math> \Pr(X \ge k) =F(nk;n,1p)\leq \exp\left(nD\left(\frac{k}{n}\left\rightp\right)\right) \quad\quad\mbox{if }p<\frac{k}{n}<1.\!</math>
Both these bounds are derived directly from the Chernoff bound. It can also be shown that,
 <math> \Pr(X \ge k) =F(nk;n,1p)\geq \frac{1}{(n+1)^2} \exp\left(nD\left(\frac{k}{n}\left\rightp\right)\right) \quad\quad\mbox{if }p<\frac{k}{n}<1.\!</math>
This is proved using the method of types (see for example chapter 12 of Elements of Information Theory by Cover and Thomas ^{[20]}).
See also
 Logistic regression
 Multinomial distribution
 Negative binomial distribution
 Binomial measure, an example of a multifractal measure.^{[21]}
References
 ^ Wadsworth, G. P. (1960). Introduction to probability and random variables. USA: McGrawHill New York. p. 52.
 ^ Hamilton Institute. "The Binomial Distribution" October 20, 2010.
 ^ See Proof Wiki
 ^ Neumann, P. (1966). "Über den Median der Binomial and Poissonverteilung". Wissenschaftliche Zeitschrift der Technischen Universität Dresden (in German) 19: 29–33.
 ^ Lord, Nick. (July 2010). "Binomial averages when the mean is an integer", The Mathematical Gazette 94, 331332.
 ^ ^{a} ^{b} Kaas, R.; Buhrman, J.M. (1980). "Mean, Median and Mode in Binomial Distributions". Statistica Neerlandica 34 (1): 13–18. doi:10.1111/j.14679574.1980.tb00681.x.
 ^ ^{a} ^{b} Hamza, K. (1995). "The smallest uniform upper bound on the distance between the mean and the median of the binomial and Poisson distributions". Statistics & Probability Letters 23: 21–25. doi:10.1016/01677152(94)00090U.
 ^ ^{a} ^{b} Box, Hunter and Hunter (1978). Statistics for experimenters. Wiley. p. 130.
 ^ NIST/SEMATECH, "7.2.4. Does the proportion of defectives meet requirements?" eHandbook of Statistical Methods.
 ^ ^{a} ^{b} NIST/SEMATECH, "6.3.3.1. Counts Control Charts", eHandbook of Statistical Methods.
 ^ MacKay, David (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press; First Edition. ISBN 9780521642989.
 ^ ^{a} ^{b} Brown, Lawrence D.; Cai, T. Tony; DasGupta, Anirban (2001), "Interval Estimation for a Binomial Proportion", Statistical Science 16 (2): 101–133, retrieved 20150105
 ^ Agresti, Alan; Coull, Brent A. (May 1998), "Approximate is better than 'exact' for interval estimation of binomial proportions" (PDF), The American Statistician 52 (2): 119–126, doi:10.2307/2685469, retrieved 20150105
 ^ Pires MA Confidence intervals for a binomial proportion: comparison of methods and software evaluation.
 ^ Wilson, Edwin B. (June 1927), "Probable inference, the law of succession, and statistical inference" (PDF), J. American Statistical Association 22 (158): 209–212, doi:10.2307/2276774, retrieved 20150105
 ^ Devroye, Luc (1986) NonUniform Random Variate Generation, New York: SpringerVerlag. (See especially Chapter X, Discrete Univariate Distributions)
 ^ Kachitvichyanukul, V.; Schmeiser, B. W. (1988). "Binomial random variate generation". Communications of the ACM 31 (2): 216–222. doi:10.1145/42372.42381.
 ^ Matoušek, J, Vondrak, J: The Probabilistic Method (lecture notes) [1].
 ^ ^{a} ^{b} R. Arratia and L. Gordon: Tutorial on large deviations for the binomial distribution, Bulletin of Mathematical Biology 51(1) (1989), 125–131 [2].
 ^ T. Cover and J. Thomas, "Elements of Information Theory, 2nd Edition", Wiley 2006
 ^ Mandelbrot, B. B., Fisher, A. J., & Calvet, L. E. (1997). A multifractal model of asset returns. 3.2 The Binomial Measure is the Simplest Example of a Multifractal
External links
 Interactive graphic: Univariate Distribution Relationships
 Binomial distribution formula calculator
 Binomial distribution calculator
 Difference of two binomial variables: XY or XY
