What Confidence Intervals Actually Tell You
Morgan Voss·
Suppose you run an experiment. You measure how long your cat takes to respond to the sound of a can opener across 30 trials and compute a 95% confidence interval: 3 to 7 seconds. A colleague looks at the interval and says there is a 95% chance the true mean is somewhere between 3 and 7.
This is wrong. The distinction is not pedantic. It reflects a genuinely different statement about what kind of object a confidence interval is.
What the Interval Is Not
The true mean response time is a fixed value. Either it is in the interval from 3 to 7 seconds, or it is not. No probability is attached to a fixed value being in a fixed interval. The interval does not fluctuate around the truth. The probability interpretation implies a distribution over possible parameter values, which is a Bayesian concept. A frequentist confidence interval is not a Bayesian object.
Saying there is a 95% probability that the true mean lies in your specific realized interval is like saying there is a 50% probability that a coin you already flipped is heads. The flip happened. The result is determined. Probability, in the frequentist framework, is a statement about processes and long-run frequencies, not about fixed but unknown facts.
What the Interval Is
A confidence interval is a statement about the procedure that generated it.
The procedure works like this: take a sample, compute the interval using a formula, repeat. Across many repetitions, 95% of the resulting intervals will contain the true parameter. Your specific interval is one realization of that procedure. It either contains the true value or it does not. The 95% refers to the long-run behavior of the method, not to any probability attached to your particular interval.
For a normally distributed population with known variance , the 95% confidence interval for the mean is:
where for 95% confidence. In practice, when is unknown, the sample standard deviation is substituted and is replaced by from the $t$-distribution with degrees of freedom.
The random object in this expression is . Before you collect data, is a random variable with its own distribution. The interval is random because it is built from a random sample. Once the sample is collected and the interval is computed, the randomness is gone. The interval is fixed. The parameter is fixed.
The Width and What Affects It
Two things change the width of the interval: sample size and the chosen confidence level.
Larger samples produce narrower intervals. The term shrinks as increases, which makes sense intuitively: more data reduces uncertainty in the estimate. Doubling the sample size reduces the interval's half-width by a factor of , roughly 30%. Getting meaningfully more precise requires a lot more data.
Increasing the confidence level widens the interval. A 99% confidence interval is wider than a 95% interval from the same data. To guarantee that a higher proportion of intervals will contain the true parameter, the intervals need to cast a wider net. There is no free lunch: higher confidence and narrower intervals cannot both be achieved simultaneously without more data.
The Correct Interpretation
The cat experiment produced an interval of 3 to 7 seconds. The correct way to read this: the procedure used to compute this interval produces intervals that contain the true mean 95% of the time. This particular interval may or may not be one of them.
That is a weaker statement than what most people want to say, but it is the honest one. It tells you something real about the estimation procedure and, by implication, about how reliable this kind of inference tends to be. It just doesn't assign a probability to a fixed interval containing a fixed value.
Comparison with Bayesian Credible Intervals
The Bayesian analogue is the credible interval, and it does say what most people think confidence intervals say.
A 95% credible interval means: given the data and a prior distribution over the parameter, there is a 95% posterior probability that the true parameter falls in this interval. That is a genuine probability statement about the parameter's location, conditional on prior beliefs being encoded correctly.
Credible intervals require specifying a prior. When the prior is diffuse and the data are informative, credible intervals and confidence intervals often look similar numerically. But they are answering different questions, and the distinction matters when the prior is informative or when the interpretation has real consequences.
A Common Misuse
Confidence intervals are sometimes used to conduct hypothesis tests: if the interval excludes zero, the effect is declared significant. This is equivalent to a two-sided hypothesis test at the corresponding level, so it gives the same result. But it imports the same problems. A wide interval that barely excludes zero is quite different from a narrow one that excludes it comfortably. The interval contains information about precision that a binary significant/not-significant verdict discards.
Reporting the interval and letting readers assess both the location and the width is more informative than the binary conclusion alone. This is increasingly the expectation in applied research, and not without reason.
The interval from the can opener experiment tells you something about how consistent the cat's response time is across trials, how much information 30 measurements carry, and where the plausible range of true average response times falls. It does not tell you that you are 95% sure the truth is in there. It tells you that the machine producing these intervals is right 95% of the time. Whether this particular output is one of those times is not something the interval itself can answer.
The cats don't charge. The site doesn't either. If something here helped a concept click, a small tip is appreciated.
Buy the Cats a TreatNo PayPal account needed.
