Norms

Geometry and algebra of linear maps, vectors, and matrices

A norm answers "how big is this vector?" It measures length. The catch is that there is more than one sensible way to measure length, and the choice quietly shapes how machine-learning models behave.

The default is the L2 (Euclidean) norm: the straight-line distance from the origin to the tip, by Pythagoras. The L1 norm instead sums the absolute coordinates, the "taxicab" distance, as if you could only travel along grid streets. The L∞ norm takes just the single largest coordinate.

Imagine walking across town from one corner to another. The straight-line, as-the-crow-flies distance is the L2 norm — what a drone would fly. But if streets force you to travel only along the grid, the city-block distance you actually walk is the L1 norm. Same trip, two honest measures of "how far," and the grid route is never shorter than the crow's.

Where this lives in MLNorms are regularization. L2 weight decay penalizes ‖w‖₂² and pulls every weight gently toward zero, keeping the model smooth. L1 regularization penalizes ‖w‖₁ and drives many weights to exactly zero, giving a sparse, feature-selecting model (the diamond corners above are why). The gradient norm ‖∇L‖₂ is monitored during training, and "gradient clipping" rescales it when it grows too large.

▶ Norms

← Dot Product Linear Combinations & Span →