Test-Time Compute
Test-time compute is the processing a model spends while answering, rather than during training. Letting a model 'think' longer at answer time — exploring and checking more before it commits — raises accuracy on hard problems without retraining the model.
Also known as: inference-time compute, test time compute, inference-time scaling
For years, the way to get a better answer was a bigger model trained on more data. Test-time compute is the other axis: keep the model fixed and let it spend more effort while answering — generating a longer chain of thought, exploring several approaches, or checking its own work before responding. On math, code, and multi-step logic, more thinking measurably beats a snap answer.
This is the idea underneath reasoning models: they’re trained to use test-time compute well, turning extra inference into accuracy. DeepSeek’s breakthrough made the trade-off concrete — strong reasoning from heavier inference rather than a larger base model, which we unpacked on the show.
The cost is direct and worth budgeting: more test-time compute means more tokens, higher latency, and a bigger bill per answer. So it’s a dial, not a default — spend it on the hard problems where being right matters, and keep it low for simple ones. See the AI reasoning overview for how it fits with chain-of-thought and reasoning models.