Linear Approximation

Multivariate calculus from first principles

Up close, every smooth surface looks flat, the way the Earth feels flat under your feet. The linear approximation replaces the curvy function near a point with the flat tangent plane that just touches it there. The gradient supplies the tilt of that plane.

Read it in words: the new value ≈ the old value, plus the gradient dotted with the step you took. That dot product is the directional derivative times the step length, the best linear guess for how much f moved.

Press a small flat sticker onto a beach ball and, right where it sits, the curved ball looks perfectly flat. The linear approximation is that sticker: a flat tangent plane that kisses the surface at one point and stands in for the curve nearby. Wander too far across the ball and the sticker peels away from the surface — the prediction drifts off.

Where this lives in MLOne gradient-descent step is a linear approximation in action. Updating w ← w − η∇L assumes the loss change is well predicted by the linear term ∇L·δ. When the step is too big, the curvature you ignored (the ‖δ‖² term) bites back and the loss can overshoot or diverge. The learning rate η keeps you in the region where treating the surface as flat is close enough to true.
▶ Linear Approximation
← Directional DerivativeThe Jacobian →