Tokenization
Tokenization splits text into the chunks a model actually processes — tokens, which are roughly word-pieces, not whole words. It's why model limits and pricing are counted in tokens, and why 'a few paragraphs' is a fuzzy unit but 'tokens' is exact.
Also known as: tokens, tokenizer
A model doesn’t read characters or whole words; it reads tokens — sub-word pieces produced by a tokenizer. Common words may be a single token, while rare words, code, or other languages split into several. As a rough rule, a token is about three-quarters of an English word, but it varies a lot by content.
Tokens are the unit that actually matters in practice. Context windows are measured in tokens, API pricing is per token, and latency scales with how many tokens go in and come out. That’s why trimming a prompt or compacting context is a real cost and speed lever, and why text that looks short can be expensive if it tokenizes badly — dense code, unusual formatting, or non-English text all use more tokens than their length suggests.