Geometry and algebra of linear maps, vectors, and matrices
A norm answers "how big is this vector?" It measures length. The catch is that there is more than one sensible way to measure length, and the choice quietly shapes how machine-learning models behave.
The default is the L2 (Euclidean) norm: the straight-line distance from the origin to the tip, by Pythagoras. The L1 norm instead sums the absolute coordinates, the "taxicab" distance, as if you could only travel along grid streets. The L∞ norm takes just the single largest coordinate.
Imagine walking across town from one corner to another. The straight-line, as-the-crow-flies distance is the L2 norm — what a drone would fly. But if streets force you to travel only along the grid, the city-block distance you actually walk is the L1 norm. Same trip, two honest measures of "how far," and the grid route is never shorter than the crow's.