Indexing the archive…
Your Universe of Digital Possibilities
A network starts knowing nothing — random weights, a meaningless wash across the plane. Then it learns: for every weight, backprop asks which way nudges the error down, and gradient descent takes one small step that way. Do it a few thousand times and a decision boundary carves itself out of the dots — intelligence as nothing but data and the slope of a loss. The whole story is the learning rate η: get it right and the loss glides down; push it too hard and the descent overshoots and explodes.
Each neuron takes a weighted sum of the layer below plus a bias, then bends it through a non-linearity. Stack two such layers and the network can carve curved, disconnected regions — not just a single straight cut.
How wrong the network is, averaged over every point. Squared so that big misses dominate and the surface is smooth — a landscape in weight space the optimiser can roll down.
The whole of learning. The gradient ∇L points uphill, so step the opposite way, scaled by the learning rate η. Repeat. There is no cleverer secret underneath modern AI than this line.
How to get ∇L cheaply: the chain rule, run backwards. The error at the output is propagated layer by layer, and each weight’s gradient is its incoming activation times the error flowing back through it.
Why one hidden layer is, in principle, enough: a wide enough network of these neurons can approximate any continuous function. Capacity is the catch — Spiral needs more units to bend the boundary far enough.
The same descent, pushed too hard, is the route to chaos. Raise η and the step overshoots the minimum, then overshoots back — a period-2 oscillation that doubles into divergence, exactly the bifurcation The Cascade draws.
This is the rack’s learning instrument — the question of where intelligence comes from, reduced to its mechanism. There is no understanding inside the network, only weights and the slope of an error: yet from data and gradient steps alone a structure emerges that generalises. It is the twin of The Lens (INST·25): both pull signal from data, but the Lens infers a hidden state with a known model while the Descent learns the model itself. Its failure mode belongs to The Cascade (INST·02) — too large a step turns smooth convergence into period-doubling chaos — and its honest, noisier cousin is The Walk(INST·19): stochastic gradient descent is this same downhill roll with the loss estimated from a random handful of points each step. The Perception Engine’s “explain the present” with the model fitted, not given.