Temperature
Temperature is the setting that controls how random a model's output is. Low temperature makes it pick the most likely next token almost every time (focused, repeatable); high temperature spreads the odds (varied, creative, less predictable). It's the main dial between consistency and creativity.
When a model generates text, it has a probability for each possible next token. Temperature reshapes those probabilities before one is sampled. Near zero, the model almost always takes the top choice, so output is focused and close to repeatable. Higher, the distribution flattens and lower-probability tokens get picked more often, so output is more varied and surprising.
The practical guidance is task-driven: use low temperature for anything that needs consistency and correctness — extraction, classification, structured output, tool-calling — and raise it for brainstorming or creative writing where variety helps. It’s not a quality knob; cranking it up doesn’t make the model smarter, just less predictable. Note that low temperature reduces randomness but doesn’t guarantee identical outputs, since other factors still introduce variation.