Inference, estimation, and decision-making from data
Why does a model that fits the training data perfectly often fail on new data? The bias–variance decomposition gives the exact, quantitative answer. It splits a model's expected prediction error into three pieces, and two of them pull in opposite directions.
Bias² is error from wrong assumptions: a model too simple to capture the truth (underfitting). Variance is error from sensitivity to the particular training sample: a model so flexible it memorizes noise (overfitting). Noise is irreducible: randomness in the data no model can ever remove.
Slide complexity in the figure. As the model grows more complex, bias² (green) falls but variance (coral) rises. The total test error (black) is their sum plus the noise floor: a U-shape whose bottom is the optimal complexity.