Multivariate Taylor

Multivariate calculus from first principles

The linear approximation (Lesson 9) used only the gradient and gave a flat tangent plane. Add the next term, the one built from the Hessian, and you get a quadratic approximation: a paraboloid that hugs the surface, capturing its curvature, not just its tilt.

Read the three pieces: f(x) is the height, ∇fᵀδ is the linear (slope) correction, and ½δᵀHδ is the quadratic (curvature) correction. That last term is a quadratic form in the step, exactly the object whose sign the Hessian's eigenvalues control.

A flat tangent plane resting on a curved surface is like setting a stiff glass slide on your eye: it touches at one spot but gaps everywhere else. A contact lens does better because it's curved to match the eye's surface, matching not just where the eye is but how it bends. The Hessian term ½δᵀHδ is that built-in curvature: it lets the approximation hug the surface instead of merely resting on it.

Where this lives in MLInstead of inching downhill one small gradient step at a time, you could fit a paraboloid to the loss and jump straight to its bottom. That is Newton's method: it minimizes the local quadratic exactly, stepping δ = −H⁻¹∇f, and converges far faster than plain gradient descent when curvature varies a lot. Adam and friends chase the same curvature correction cheaply, per-parameter, without ever…
▶ Multivariate Taylor
← Constrained OptimizationDouble Integrals →