AI Glossary

AI Guardrails

Guardrails are the checks that keep an AI system inside safe, intended behavior — filtering inputs, constraining what it can do, and validating outputs before they reach a user. They run outside the model, so they hold even when the model is wrong or manipulated.

· Chain of Thought

AI SecurityAI Agents

Guardrails are the boundary around an AI system that catches bad behavior the model itself might produce. They operate at three points: on the input (block prompt injection, unsafe or out-of-scope requests), on the actions (scope permissions, require approval for high-stakes tool calls), and on the output (filter unsafe, off-policy, or ungrounded responses before they ship).

The key property is that real guardrails live outside the model — as validation code, permission systems, and approval gates — not as instructions in the prompt. A rule written into the prompt is a request the model can be talked out of; a guardrail enforced in code holds even when the model is wrong or an attacker has manipulated it. They’re the practical mechanism behind deploying autonomy you don’t fully trust: the goal isn’t a less capable agent, it’s a small blast radius when it fails.

Go deeper