AI Glossary

Explainability

Explainability is how well you can understand why an AI system produced a given output. It matters most where decisions need to be justified — lending, hiring, healthcare — and it's hard for large models, whose reasoning isn't transparent just because they can narrate a plausible-sounding rationale.

Also known as: interpretability, explainable AI

· Chain of Thought

AI Evaluation & Reliability

Explainability is the ability to answer “why did the model do that?” in terms a human can check. For simple models you can often trace the decision to specific features. For large language models it’s much harder — billions of parameters don’t yield a clean story, and the field of interpretability is actively working on it.

The trap is mistaking a model’s self-explanation for a real one. Ask an LLM why it answered something and it will produce a fluent rationale, but that narration is generated after the fact and may not reflect the actual computation — a plausible story, not a verified cause. Genuine explainability needs methods that probe the model’s behavior, plus, for high-stakes decisions, system design that keeps a human able to understand and override. It’s a regulatory requirement in some domains, not just a nicety.