Stats & Cats

Data, statistics, and cats doing maths

Tagged: undergraduate

Articles tagged with undergraduate in Stats and Cats.

Anecdote and Data
What personal experience can and cannot tell you. Anecdote is not worthless: it is a sample of one from an unknown distribution. The weight it deserves depends on what else is known.
Florence Nightingale's Rose Diagrams
How Nightingale used polar area charts to make mortality data legible to people who would not read a table, and why the design choices were deliberate and effective.
How Charts Mislead Without Lying
Truncated axes, dual y-axes, cherry-picked time windows, and area encoding errors. What to look for before trusting a chart.
Law of Large Numbers vs. the Gambler's Fallacy
The law of large numbers is a theorem. The gambler's fallacy is a mistake. They sound related and are easy to confuse. They say opposite things.
The Multiple Comparisons Problem
Run enough tests at a 0.05 threshold and something will look significant by chance. What the family-wise error rate means, why it matters, and what to do about it.
Simpson's Paradox: When Subgroups Disagree With the Aggregate
How a trend that holds within every subgroup can reverse when those groups are combined, and why the Berkeley admissions data remains the clearest illustration.
Statistical Power: Why Small Studies Often Find Nothing
Power is the probability of detecting an effect that actually exists. A study that finds nothing may simply have been too small to find anything. Here's what determines power and why it matters before collecting data.
Survivorship Bias: The Sample You're Not Seeing
When you study only the outcomes that made it through a filter, you are not studying outcomes. You are studying a selection process. The WWII plane problem and why it matters wherever data gets filtered.
The Birthday Problem: Why 23 People Is Enough
In a room of 23 people, the probability that two share a birthday exceeds 50%. The math is clean; the intuition resists it. Here's what's actually being counted.
The Hot Hand
Streaks in basketball shooting data and whether they reflect genuine elevated performance or expected clustering in random sequences. A case study in what random actually looks like.
The Monty Hall Problem: Why You Should Always Switch
The conditional probability problem that has produced more confident wrong answers than almost any other. The correct answer is 2/3, and the host's knowledge is why.
Variance and Standard Deviation: Why Spread Matters
Two distributions with identical means can behave entirely differently. Variance and standard deviation measure why, and understanding the mechanics behind them reveals what they actually capture.
What a p-value Actually Measures
A p-value is not the probability the null hypothesis is true, not a measure of effect size, and not a verdict on whether a finding is real. Here is what it is.
The Binomial Distribution: Counting Successes in Fixed Trials
How the binomial distribution models the number of successes in a fixed number of independent trials, and why the formula looks the way it does.
The Poisson Distribution: Modeling Rare Events at a Known Rate
The Poisson distribution models counts of independent events occurring at a constant rate. One parameter does everything, and that turns out to be enough.
The Central Limit Theorem: Why Averages Behave
Individual observations can follow nearly any distribution. Average enough of them together, and the result converges toward normal. Here's why that happens and why it matters.
Regression to the Mean: Why Exceptional Performance Doesn't Last
Extreme outcomes tend to be followed by more ordinary ones. This is not a psychological phenomenon. It is a mathematical one, with real implications for how we evaluate causes and interventions.
Type I and Type II Errors: The Trade-Off You Can't Avoid
False positives and false negatives are not both minimizable at once. The threshold that reduces one will increase the other. Where it gets set is a choice, and it matters.
What Confidence Intervals Actually Tell You
A 95% confidence interval does not mean a 95% probability that the true value is inside it. Here is what the statement actually means, and why the distinction is worth getting right.
Discrete vs Continuous Distributions: PMF, PDF, and CDF
The difference between discrete and continuous probability distributions, explained through the PMF, PDF, and CDF, with cat examples that are doing actual work.