Critical Points

Single-variable calculus from first principles

To find the peaks and valleys of a function (its maxima and minima) you hunt for the flat spots. At the top of a hill or the bottom of a valley, the tangent line is horizontal, so the slope is zero. Those are the critical points.

Setting f′(x) = 0 and solving gives the candidate locations. This is a necessary condition for a smooth peak or valley, but not quite sufficient, since a flat spot could also be a momentary pause (a saddle-like inflection). You confirm what kind it is with a test.

Picture a hike across rolling hills. As you climb toward a hilltop the ground tilts up under your boots; as you head down into a valley it tilts the other way. Right at the very top of a hilltop, or the lowest point of a valley bottom, the ground is momentarily flat, the slope is zero. Those flat spots are exactly the critical points you hunt for.

Where this lives in MLTraining a model is minimising a loss, and the minimum sits where the gradient is zero: exactly the critical-point condition, generalised to many variables (∇L = 0). Gradient descent is a numerical hunt for that flat spot. In high dimensions, most critical points are saddle points rather than true minima, which is why optimisation in deep learning is subtle: the flat-spot condition alone doesn't…
▶ Critical Points
← Higher-Order DerivativesSecond Derivative Test →