Directional Derivative

Multivariate calculus from first principles

Partial derivatives only tell you the slope along the coordinate axes, but you can walk off in any direction. The directional derivative D_u f answers: if I step along the unit vector u, how fast does f change? The answer turns out to be a single dot product with the gradient.

Imagine hiking across that same hill, but instead of facing straight uphill you pick a compass bearing, say north-east, and walk that way. The directional derivative D_u f is the slope you actually feel under your boots along that heading. Head toward the steepest direction and you feel the full climb; turn sideways along the hillside and the ground feels flat.

Since D_u f = ∇f·u = ‖∇f‖‖u‖cos θ = ‖∇f‖cos θ (because u is a unit vector), the rate of change is largest exactly when cos θ = 1, that is, when u points along ∇f. Spin the direction arrow below and watch the slope readout peak when it aligns with the gradient and vanish when it's perpendicular.

Where this lives in MLThis is the theorem that justifies gradient descent. Among all directions you could step, −∇L provably decreases the loss fastest. So if you ever wonder why training steps along the gradient rather than some other direction, this is the answer: the gradient is the best local choice, which is why w ← w − η∇L is the universal update.
▶ Directional Derivative
← The GradientLinear Approximation →