Faithfulness
Faithfulness measures whether an answer is actually supported by the source material it was given — every claim traceable to the retrieved context, nothing invented. It's the core anti-hallucination metric for RAG systems.
Also known as: groundedness
AI Evaluation & ReliabilityRAG & Retrieval
In a RAG system the model is supposed to answer from the documents it retrieved, not from its own memory. Faithfulness (sometimes called groundedness) measures how well it does that: are all the claims in the answer backed by the retrieved context, or did the model add things that aren’t there? It’s the metric that directly targets hallucination.
It’s usually scored by checking each claim in the answer against the supplied sources, often with an LLM-as-a-judge. Crucially, faithfulness is separate from whether the answer is correct — an answer can be perfectly faithful to wrong source documents. That’s why it pairs with context relevance (was the right material retrieved?) and answer relevance (did it address the question?) rather than standing alone.