Critical Points in Rⁿ

Multivariate calculus from first principles

Optimization in many dimensions begins exactly where it did in 1-D: find where the slope is zero. But now 'slope' is the whole gradient vector, so a critical point is where every partial derivative vanishes at once, ∇f = 0.

This is necessary but not sufficient: a zero gradient marks a minimum, a maximum, or a saddle. To tell them apart you bring in the Hessian and read its eigenvalue signs, the second-order test from Lesson 13. Zero gradient locates the candidate; the Hessian classifies it.

Walk a hilly golf course and look for the level spots, the places where a ball would sit still. The tee on a hilltop, the low green in a hollow, and the flat saddle along a ridge are all spots where the ground is momentarily flat in every direction. That flatness is ∇f = 0; whether you are on a peak, in a hollow, or on a saddle is a separate question the Hessian answers.

Where this lives in MLEvery gradient-based training run is a search for ∇L = 0: the optimizer keeps stepping until the gradient is negligibly small. Because of the saddle-point story (Lesson 13), what it usually finds isn't 'the' global minimum but one of an enormous number of near-equivalent low-loss regions. That gradient descent reliably lands in a good enough one is much of the empirical mystery, and the success,…
▶ Critical Points in Rⁿ
← Computation GraphsConvexity →