Matrix Multiplication

Geometry and algebra of linear maps, vectors, and matrices

Matrix multiplication looks like a fiddly rule, but its meaning is clean: AB is the composition of two transformations. Do B first, then A. The product is the single matrix that achieves both motions in one go.

To compute an entry of AB, take a row of A and dot it with a column of B. Entry (i, j) is row i of A dotted with column j of B. That's the whole algorithm: dot products, arranged in a grid.

Picture two machines on a factory line. The first machine B reshapes a part, then the second machine A reshapes it again. The product AB is the single combined machine that does both steps in one pass — and the order on the line is fixed, since the part must go through B before A.

Where this lives in MLComposing layers is matrix multiplication. A two-layer linear stack W₂(W₁x) equals (W₂W₁)x; the layers fuse into one map. In attention, the scores come from a product QKᵀ and the output from multiplying those weights by V. Every forward pass is a chain of these products, and the shape rule is what GPUs are built to crunch.
▶ Matrix Multiplication
← Matrices as Linear MapsTranspose →