Multivariate calculus from first principles
The gradient packaged all the first derivatives. The Hessian packages all the second derivatives of a scalar function f: Rⁿ → R into a matrix. Where the gradient gives slope, the Hessian gives curvature: how the slope itself changes as you move around.
By Clairaut's theorem (Lesson 6), Hᵢⱼ = Hⱼᵢ, so the Hessian is always symmetric for the smooth functions we care about. That's a gift: symmetric matrices have real eigenvalues and orthogonal eigenvectors, and those eigenvalues are exactly the curvatures along the principal directions.
If the gradient is the speedometer of a surface, the Hessian is its curvature dashboard: it reports how the slope itself is bending in every direction at once. A surface curving up all around you reads like the bottom of a valley; curving down all around reads like the top of a dome; up one way and down another is a saddle. The Hessian packs all of that into one symmetric grid of second derivatives.