Convexity

Multivariate calculus from first principles

Some optimization problems are easy and some are hard, and one property draws the line: convexity. A convex function has a single bowl shape with no false bottoms, so finding a place where the gradient is zero means you've found the global minimum. No saddles, no local traps.

The defining picture: a function is convex if the straight chord between any two points on its graph lies above (or on) the graph itself. The function never bulges above its own shortcuts.

Compare a smooth salad bowl with a bumpy egg carton. The bowl has one true bottom: roll a marble in from anywhere and it always settles at the same low point. The egg carton is full of little traps, each a false bottom that catches the marble short of the lowest one. A convex function is the salad bowl, and that single guaranteed minimum is what makes it easy to optimize.

Where this lives in MLThe convex/non-convex split explains a lot of ML. Linear and logistic regression are convex, so gradient descent provably reaches the global optimum and any two runs agree. Deep networks are wildly non-convex, full of critical points, with results that shift with initialization and randomness. That gap is why classical ML feels dependable and deep learning feels finicky. Jensen's inequality,…
▶ Convexity
← Critical Points in RⁿConstrained Optimization →