← All posts

The Central Limit Theorem: Why Averages Behave

Morgan Voss·

A cat shelter in a mid-sized city houses around 80 cats at any given time. Their weights span an impressive range: newborn kittens under half a kilogram, geriatric tabbies settling into comfortable middle age, and one notably large orange male who tips the scale at nearly seven kilograms and seems aware of it. Plot the weight distribution and you get something lumpy, skewed, and not remotely bell-shaped.

Now randomly select 30 cats, compute the average weight of that group, and record it. Return all 30 cats. Repeat the process several hundred times. Plot those averages.

The result is a bell curve. The individuals are chaotic. The averages are not.

The Theorem

The Central Limit Theorem (CLT) states, in its standard form, that if X1,X2,,XnX_1, X_2, \ldots, X_n are independent and identically distributed random variables with mean μ\mu and variance $\sigma^2$, then the sample mean

Xˉn=1ni=1nXi\bar{X}_n = \frac{1}{n} \sum_{i=1}^n X_i

converges in distribution to a normal distribution as nn \to \infty:

XˉndN ⁣(μ,σ2n)\bar{X}_n \xrightarrow{d} N\!\left(\mu,\, \frac{\sigma^2}{n}\right)

The phrasing "converges in distribution" means something specific. It does not say the sample mean becomes normally distributed in finite samples. It says the CDF of Xˉn\bar{X}_n converges pointwise to the CDF of N(μ,σ2/n)N(\mu, \sigma^2/n) as nn grows. For practical purposes this distinction rarely matters, but it is worth knowing what the theorem actually claims.

The mean of the sampling distribution is μ\mu: the sample mean is an unbiased estimator of the population mean. The variance is σ2/n\sigma^2 / n, which shrinks as nn grows. Larger samples produce less variable estimates. That is both reassuring and obvious in retrospect.

The Standard Error

The standard deviation of Xˉn\bar{X}_n has its own name: the standard error.

SE=σn\text{SE} = \frac{\sigma}{\sqrt{n}}

The square root in the denominator is why precision is expensive. To halve the standard error, you need four times the sample size. To reduce it by a factor of ten, you need 100 times the data. The law of diminishing returns applies directly.

In the shelter example, suppose cat weights have a population standard deviation of about 1.2 kilograms. With groups of 30, the standard error of the mean weight is $1.2 / \sqrt{30} \approx 0.22$ kilograms. The averages cluster within a fairly narrow band, even though the underlying population has substantial spread.

What the Theorem Does Not Require

The CLT applies regardless of the shape of the underlying distribution. It does not require the population to be normal. It does not require the population to be symmetric. It requires finite mean and variance, independence, and identical distributions. The shelter cats satisfy these conditions well enough.

Most of the distributions encountered in practice are not normal: income is right-skewed, reaction times have heavy tails, count data is discrete. Yet averages of these quantities behave normally. The normal distribution appears not because nature generates it directly, but because averaging is a smoothing operation that washes out idiosyncratic features of the source distribution.

The Practical Rule of Thumb

The CLT is an asymptotic result, which means it is precisely true in the limit as nn \to \infty. For finite samples, it is an approximation. The quality of that approximation depends on the shape of the underlying distribution.

The widely cited rule of thumb is n30n \geq 30. For distributions that are roughly symmetric and unimodal, 30 observations is usually sufficient for the normal approximation to be serviceable. For heavily skewed or heavy-tailed distributions, you may need considerably more. The rule is a starting point, not a guarantee.

For the shelter cats, with a moderately skewed distribution and that single outlier on the heavy end, groups of 30 produce averages that track the normal approximation well. Groups of five would not.

What inference relies on

Most of classical statistical inference rests on this result. Confidence intervals for means assume that Xˉ\bar{X} is approximately normal, which the CLT justifies. Hypothesis tests for means use the same assumption. The t-distribution emerges from estimating σ\sigma rather than knowing it, but the underlying logic is still the CLT.

Without the theorem, inference would require knowing the population distribution in order to say anything about sampling distributions. That would make statistics considerably less useful in practice, where population distributions are almost never known.

The shelter's orange tabby can be as heavy as he likes. Averaged into groups of 30, he becomes one data point in a well-behaved normal distribution of means. The theorem takes messy individual reality and produces orderly aggregate behavior. That trade is the foundation of most of what follows in introductory statistics.

The cats don't charge. The site doesn't either. If something here helped a concept click, a small tip is appreciated.

Buy the Cats a Treat

No PayPal account needed.