Geometry and algebra of linear maps, vectors, and matrices
When Ax = b has no exact solution (the usual case with more data than parameters), you do the next best thing: find the x that makes Ax as close to b as possible. "Close" means smallest squared error. This is least squares, the method underneath ordinary regression.
The geometry is the whole story. The reachable outputs Ax form the column space of A, a plane sitting inside a higher-dimensional space. The target b usually floats off that plane. The closest reachable point is the orthogonal projection of b onto the plane: drop a perpendicular from b straight down, and where it lands is Ax.
In the figure, move b off the line and watch the projection (the best fit) slide along to stay directly beneath it, with the error always perpendicular.