Functions f: Rⁿ → R

Multivariate calculus from first principles

A function f: Rⁿ → R takes a vector in and returns a single number. The example that drives machine learning is the loss: feed in every weight of the network, get back one number that says how badly it is doing. The whole of training is a hunt for the lowest point of this function.

For two inputs you can actually picture it: z = f(x, y) is a surface, a landscape of hills and valleys floating above the xy-plane. The height at each (x, y) is the function's value.

Imagine the air in a room: stand at any spot and a thermometer reads exactly one temperature. That is a function f: R² → R in disguise: a position (x, y) goes in, and a single number (the warmth there) comes out. The whole room becomes a landscape of warm and cool patches, higher near the radiator, lower by the window.

Where this lives in MLWhen you watch a loss curve tick downward during training, you are watching a walk across one of these surfaces. The loss L(w₁, …, wₙ) is a function Rⁿ → R over weight space, with n in the millions or billions, and the curve on your screen is just a one-dimensional shadow of that walk. The 'flat vs. sharp minima' pictures researchers argue about are literally contour and surface plots of this…

▶ Functions f: Rⁿ → R

← Vectors & Geometry of Rⁿ Functions f: Rⁿ → Rᵐ →