Law of Large Numbers

The mathematics of uncertainty

Flip a fair coin ten times and you might get 7 heads. Flip it ten thousand times and the fraction of heads will hug 0.5 astonishingly closely. That's the law of large numbers: as you collect more data, the sample mean converges to the true expectation.

The randomness doesn't vanish, and individual outcomes stay unpredictable, but the average of many of them settles down. The weak law says this convergence is "in probability": for any tolerance, the chance the average is off by more than that tolerance shrinks toward 0 as n grows.

Press Run in the figure to flip coins one at a time and watch the running average wander wildly at first, then home in on the dashed true mean. More samples, tighter convergence.

Where this lives in MLThe law of large numbers is what makes mini-batch training sound. The true gradient is an expectation over the whole data distribution; a mini-batch gradient is a sample average of it. By the LLN, that average approximates the true gradient and gets more accurate with larger batches. Every Monte Carlo estimate in ML (expected reward, an ELBO term, an empirical risk) leans on this law to justify…

▶ Law of Large Numbers

← Mutual Information Central Limit Theorem →