Multivariate calculus from first principles
Often you don't want the lowest point everywhere; you want the lowest point subject to a constraint. Minimize loss while keeping the weight norm bounded; maximize margin while points stay correctly classified. Lagrange multipliers are the standard tool for optimizing along a constraint curve.
The geometry to hold onto: at the constrained optimum, the level curves of f are tangent to the constraint g(x) = 0. If they crossed instead of touching, you could slide along the constraint to a better value. Tangency means the two gradients point along the same line, so they're parallel:
The scalar λ (the Lagrange multiplier) is the proportionality factor. Packaging both conditions into one object gives the Lagrangian L = f − λg; setting ∇L = 0 recovers exactly the equations above.