State-Space Models (Mamba)
State-space models are a transformer alternative that process sequences by carrying a compact running state forward, rather than comparing every token to every other token. They scale linearly with sequence length instead of quadratically — cheaper on long inputs — with Mamba the best-known example.
Also known as: Mamba, SSM, linear attention
The transformer’s attention compares every token to every other token, so its cost grows with the square of the sequence length — fine for short inputs, painful for very long ones. State-space models take a different approach borrowed from signal processing: they sweep through the sequence carrying a fixed-size running state, which makes their cost grow linearly. Mamba is the architecture that brought this approach to competitive language modeling.
The appeal is efficiency on long sequences; the open question is whether they match transformers’ quality across the board, since the global “look at everything at once” property of attention is useful. The frontier today is largely hybrid — combining attention with state-space-style layers — and it’s part of the broader “beyond transformers” push by teams like Liquid AI to keep the strengths while cutting the scaling cost.