Convexity

Single-variable calculus from first principles

Convexity is the shape that makes optimisation easy. A convex function cups upward everywhere, like a bowl, and that one property makes it easy to minimise: there's exactly one lowest point, and any downhill path leads straight to it.

There are three equivalent ways to see convexity. First, the second derivative is non-negative everywhere: f″(x) ≥ 0. Second, the curve cups up and never bends downward. Third, the defining picture, a chord between any two points lies above the curve.

Picture a smooth valley, or the inside of a bowl, and drop a marble anywhere along it. No matter where it starts, the marble always rolls down to the single lowest point and settles there. That is exactly what convexity buys you: one valley, no false bottoms, so any downhill path leads to the one true minimum.

Where this lives in MLConvexity is the dividing line in ML. Linear/logistic regression and SVMs have convex losses: one global minimum, training is reliable and reproducible. Deep networks have wildly non-convex losses, with countless local minima and saddles, which is why different random initialisations land in different solutions, why the learning rate matters so much, and why there's no single "the" optimum.…

▶ Convexity

← Second Derivative Test Gradient Descent Preview →