Stats & Cats

Stats & Cats

Data, statistics, and cats doing maths

← All posts

The Binomial Distribution: Counting Successes in Fixed Trials

Morgan Voss·

A cat that hunts is running an experiment, whether she knows it or not. Each pounce is a trial. It either succeeds or it does not. The outcome of one attempt does not influence the next. And if you watch long enough, you start asking a different question: not whether she catches something, but how many she catches in a given session.

That is the question the binomial distribution answers.

The Setup

The binomial distribution applies when four conditions hold. There are exactly nn trials. Each trial results in one of two outcomes, conventionally called success and failure. The trials are independent. And the probability of success, pp, is the same on every trial.

These four conditions define a sequence of Bernoulli trials, named for the Swiss mathematician Jacob Bernoulli. A single Bernoulli trial is just a coin flip with a possibly unfair coin. The binomial distribution describes what happens when you run nn of them and count the successes.

Call that count XX. The random variable XX can take any integer value from 00 to nn.

The PMF

The Probability Mass Function gives the probability of exactly kk successes:

P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k}

The formula has two parts, and both matter.

The term pk(1p)nkp^k (1-p)^{n-k} is the probability of one specific sequence containing exactly kk successes and nkn - k failures. If k=3k = 3 and n=5n = 5, for instance, one such sequence is success-success-success-failure-failure, with probability p3(1p)2p^3 (1-p)^2.

But there are many sequences with exactly three successes. The binomial coefficient (nk)\binom{n}{k}, read as "$n$ choose kk," counts how many:

(nk)=n!k!(nk)!\binom{n}{k} = \frac{n!}{k!\,(n-k)!}

Multiply the two pieces together and you have the probability of getting kk successes in any order. The formula is not arbitrary. It follows directly from counting.

A Worked Example

Suppose a cat succeeds on roughly 30% of pounce attempts, and makes eight attempts in a session. Each attempt is independent. We want P(X=3)P(X = 3): the probability of exactly three successes.

P(X=3)=(83)(0.3)3(0.7)5P(X = 3) = \binom{8}{3}(0.3)^3(0.7)^5

=56×0.027×0.168070.254= 56 \times 0.027 \times 0.16807 \approx 0.254

About a 25% chance of exactly three successes. The distribution also gives us the full picture: we can compute P(X=0)P(X = 0) through P(X=8)P(X = 8), and all eight values sum to one.

Binomial distribution models probabilities of two outcomes

Expected Value and Variance

The expected number of successes is:

E[X]=npE[X] = np

For our hunting cat, that is 8×0.3=2.48 \times 0.3 = 2.4 successes per session. Not a whole number, which is fine -- expected value is not a prediction of any single outcome. It is the long-run average.

The variance is:

Var(X)=np(1p)\text{Var}(X) = np(1-p)

And the standard deviation is np(1p)\sqrt{np(1-p)}. Both depend on pp in a way worth noting: the variance is largest when p=0.5p = 0.5 and shrinks toward zero as pp approaches either extreme. When success is nearly certain or nearly impossible, there is little variability -- you already know roughly what will happen.

Assumptions and When They Break

The binomial model is exact when the four conditions are exactly met. In practice, they rarely are. What matters is how badly they are violated.

Independence is the most commonly broken condition. Whether one hunting attempt affects the next depends on the prey behavior, the cat's fatigue, and a dozen other things. If trials influence each other, the binomial is at best an approximation.

The fixed-$p$ assumption can also fail. A cat may improve over the course of a session, or tire. If pp drifts, the binomial is still sometimes useful as a baseline, but it will misrepresent the tails of the distribution.

The Normal Approximation

For large nn, the binomial distribution is well-approximated by a normal distribution with mean $np$ and standard deviation np(1p)\sqrt{np(1-p)}. The approximation is reliable when np5np \geq 5 and n(1p)5n(1-p) \geq 5. Outside that range, the exact binomial is preferable.

The binomial probability calculator computes exact values for any nn, pp, and kk using log-space arithmetic to handle large nn without numerical underflow.

What Makes It Useful

The binomial distribution is not especially subtle, but its simplicity is the point. When the four conditions hold -- or approximately hold -- it is the right model, and it is completely tractable. You can compute the PMF, the CDF, the expected value, and the variance in closed form.

The harder question is always whether the conditions hold. The binomial distribution gives the right answer to the right question. Identifying the right question first is the more consequential step.