Retrieval-Augmented Generation (RAG)
RAG is the pattern of fetching relevant documents at query time and feeding them to a model alongside the question, so the answer is grounded in real sources instead of the model's memory. It's how you put private or current data in front of a model without retraining it.
Also known as: RAG
A bare language model only knows what it learned in training — not your documents, not last week’s data. Retrieval-augmented generation closes that gap: at query time the system retrieves the most relevant chunks from a store and hands them to the model with the question, so the model answers from real sources rather than from memory.
It’s the default way to give a model private, current, or fast-changing knowledge, because you update the store instead of retraining the model. The catch is that RAG is only as good as its retrieval — fetch the wrong chunks and the model answers confidently from bad context. That’s why retrieval quality, not just the model, decides whether a RAG system works.