The Jacobian

Multivariate calculus from first principles

When the output is a vector too, a function f: Rⁿ → Rᵐ, one gradient isn't enough. You need the partial of every output with respect to every input. Stack them all into a matrix and you get the Jacobian J, the full first derivative of a vector-valued map.

Row i of J is just the gradient of the i-th output. So the Jacobian is a stack of gradients, one per output coordinate. Its shape is m × n: as many rows as outputs, as many columns as inputs.

Think of a sound engineer's mixing desk, where every output channel responds to every input knob. The Jacobian is that sensitivity table written out: each entry says how much one output moves when you nudge one input knob. Read across a row to see everything that drives a single output; read down a column to see everything one knob controls.

Where this lives in MLA layer's Jacobian says how a small perturbation of its input changes its output, the local stretch-and-squeeze of that layer. Backpropagation is just multiplying these per-layer Jacobians together (next module). When people worry about vanishing or exploding gradients, they are worrying about that product of layer Jacobians shrinking to nothing or blowing up.

▶ The Jacobian

← Linear Approximation Jacobian Geometry →