Glossary · Term

Guardrail

Guardrail is a safety device to prevent AI from giving dangerous or inappropriate answers.

Guardrails are a general term for safety devices that prevent AI from providing dangerous or inappropriate answers. Just as a road guardrail prevents a car from going off a cliff, it is a device that rejects requests for weapons recipes or promoting self-harm and filters out swearing or leaks of personal information.

Because LLM can reproduce all kinds of content in the learning data, such control is necessary if it is to be released as a service. A method of teaching rejection at the model learning stage and a method of inspecting input and output with separate filters are used together, and a guardrail is also used by companies to prevent comments outside the scope of work in their chatbots.

However, guardrails are not perfect, so there are constant attempts to break out of them with clever inputs. On the other hand, if it is too strict, it will result in excessive blocking that rejects even normal questions, making the balance between safety and usability an ongoing challenge.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries