Multivariate calculus from first principles
When the output is a vector too, a function f: Rⁿ → Rᵐ, one gradient isn't enough. You need the partial of every output with respect to every input. Stack them all into a matrix and you get the Jacobian J, the full first derivative of a vector-valued map.
Row i of J is just the gradient of the i-th output. So the Jacobian is a stack of gradients, one per output coordinate. Its shape is m × n: as many rows as outputs, as many columns as inputs.
Think of a sound engineer's mixing desk, where every output channel responds to every input knob. The Jacobian is that sensitivity table written out: each entry says how much one output moves when you nudge one input knob. Read across a row to see everything that drives a single output; read down a column to see everything one knob controls.