Model Diagnostics

Inference, estimation, and decision-making from data

Fitting a regression is the easy part. The harder question is whether you can trust it. Model diagnostics are the checks that catch a model that fits the numbers but violates the assumptions underneath. The most useful object to look at is the residual: e = y − ŷ, the leftover the model couldn't explain.

If the model is right, the residuals should look like pure noise: no pattern, constant spread, roughly symmetric. The main tool is a residual plot: residuals on the y-axis against the fitted values (or an input) on the x-axis. You're hunting for structure that shouldn't be there.

A good doctor does not stop at naming the illness; they check what symptoms are left over after treatment. If a patient still has a stubborn cough the diagnosis missed something. Residuals are a model's leftover symptoms: the part of the data the fitted line could not explain. If they show a clear pattern instead of harmless random noise, the model has missed something too.

Where this lives in MLResidual analysis is the statistical ancestor of learning-curve and error analysis in ML. "Training loss ≠ validation loss" is a diagnostic: a big gap signals overfitting (high variance), just as patterned residuals signal a misspecified model. Slicing your errors by subgroup to find where the model systematically fails is exactly residual-plot thinking, scaled up.

▶ Model Diagnostics

← Multiple Linear Regression Regularized Regression →