Inference, estimation, and decision-making from data
The t-test is the workhorse for comparing means. It answers questions like "is this mean different from a target?" or "are these two groups' means different?", using a test statistic that measures the gap between means in units of standard error.
For the one-sample case (is the mean μ equal to a target μ₀?), the statistic is:
The numerator is "how far is the sample mean from the target?"; the denominator is the standard error. A big |t| means the gap is large relative to the noise, which is evidence against H₀.
Where this lives in MLThe paired t-test is the right tool for "is model A significantly better than model B?" when both are evaluated on the same examples. Pairing on each test instance cancels the example-to-example difficulty variation, isolating the model difference. Beware: standard CV folds overlap, which violates independence, and corrected paired tests exist for exactly this (you'll meet them in lesson 22).