Multiple Linear Regression

Inference, estimation, and decision-making from data

Real predictions use many inputs, not one. Multiple linear regression generalizes the line to a flat plane (or hyperplane) in higher dimensions: each feature gets its own coefficient. Stacking all the data into a matrix X, the model is gorgeously compact:

Here X is the n×d design matrix (one row per observation, one column per feature), β is the vector of coefficients, and y the outputs. The OLS solution has a famous closed form:

The geometry is worth picturing. The vector of predictions Xβ̂ must live in the column space of X, the set of all combinations of your feature columns. OLS picks the β̂ whose prediction is the point in that space closest to y. Geometrically, ŷ is the orthogonal projection of y onto the column space, and the residual y − ŷ is perpendicular to it. That perpendicularity is exactly what (XᵀX)⁻¹Xᵀ computes.

Where this lives in MLYou are looking at the least-squares problem from linear algebra, the same projection-onto-column-space idea. The normal-equations formula is the closed-form ancestor of what gradient descent approximates for bigger models. When XᵀX is ill-conditioned (near-collinear features), the inverse blows up, which is precisely the problem ridge regression fixes by adding λI, the topic two lessons ahead.

▶ Multiple Linear Regression

← Simple Linear Regression Model Diagnostics →