The Hessian

Multivariate calculus from first principles

The gradient packaged all the first derivatives. The Hessian packages all the second derivatives of a scalar function f: Rⁿ → R into a matrix. Where the gradient gives slope, the Hessian gives curvature: how the slope itself changes as you move around.

By Clairaut's theorem (Lesson 6), Hᵢⱼ = Hⱼᵢ, so the Hessian is always symmetric for the smooth functions we care about. That's a gift: symmetric matrices have real eigenvalues and orthogonal eigenvectors, and those eigenvalues are exactly the curvatures along the principal directions.

If the gradient is the speedometer of a surface, the Hessian is its curvature dashboard: it reports how the slope itself is bending in every direction at once. A surface curving up all around you reads like the bottom of a valley; curving down all around reads like the top of a dome; up one way and down another is a saddle. The Hessian packs all of that into one symmetric grid of second derivatives.

Where this lives in MLWhen gradient descent crawls down a long narrow valley, bouncing slowly off the steep walls, the Hessian explains why. Its eigenvalues are the curvatures in each direction, and a wide spread between them (a high condition number) is exactly that valley: steep one way, nearly flat the other. Second-order methods like Newton, and in spirit Adam's per-parameter scaling, read the Hessian to correct…

▶ The Hessian

← Jacobian Geometry Hessian Geometry →