Bias-Variance Decomposition

Inference, estimation, and decision-making from data

Why does a model that fits the training data perfectly often fail on new data? The bias–variance decomposition gives the exact, quantitative answer. It splits a model's expected prediction error into three pieces, and two of them pull in opposite directions.

Bias² is error from wrong assumptions: a model too simple to capture the truth (underfitting). Variance is error from sensitivity to the particular training sample: a model so flexible it memorizes noise (overfitting). Noise is irreducible: randomness in the data no model can ever remove.

Slide complexity in the figure. As the model grows more complex, bias² (green) falls but variance (coral) rises. The total test error (black) is their sum plus the noise floor: a U-shape whose bottom is the optimal complexity.

Where this lives in MLThis decomposition is the theory of underfitting vs overfitting, and it's how you read a learning curve. High training and test error = high bias = underfitting (use a bigger model). Low training but high test error = high variance = overfitting (regularize, get more data, or simplify). Model-complexity selection is literally finding the bottom of this U.
▶ Bias-Variance Decomposition
← Regularized RegressionCross-Validation →