Least Squares

Geometry and algebra of linear maps, vectors, and matrices

When Ax = b has no exact solution (the usual case with more data than parameters), you do the next best thing: find the x that makes Ax as close to b as possible. "Close" means smallest squared error. This is least squares, the method underneath ordinary regression.

The geometry is the whole story. The reachable outputs Ax form the column space of A, a plane sitting inside a higher-dimensional space. The target b usually floats off that plane. The closest reachable point is the orthogonal projection of b onto the plane: drop a perpendicular from b straight down, and where it lands is Ax.

In the figure, move b off the line and watch the projection (the best fit) slide along to stay directly beneath it, with the error always perpendicular.

Where this lives in MLLinear regression is least squares. The closed-form solution β = (XᵀX)⁻¹Xᵀy is the normal equations solved for the coefficients. The same projection idea defines the pseudoinverse A⁺, the all-purpose tool for "solve Ax = b as well as possible." Every squared-error loss in ML traces back to this picture of projecting onto what the model can reach.

▶ Least Squares

← PCA via SVD Matrix Norms →