Glossary · Term

Jailbreak

Jailbreak is the act of bypassing AI’s safety devices and eliciting prohibited answers through clever input.

Jailbreaking is the act of bypassing AI's safeguards with cleverly designed inputs, eliciting responses that should have been rejected. For example, by asking them to play a situation using the lines of a villain in a novel, or by twisting the instructions into several layers and making them say something they would refuse if asked directly.

This naturally emerged in the process of testing how solid AI's guardrails are, and security researchers engage in red team activities that intentionally attempt jailbreaks to find vulnerabilities in advance. AI companies repeatedly patch patches to block discovered methods, but it is a battle of spear and shield, with new workarounds continuing to emerge.

Jailbreak shows that AI safety is an ongoing challenge that does not end with a single design. If a corporate chatbot is jailbroken and makes false promises or inappropriate remarks, it can lead to real losses, so it is no stranger to companies adopting AI.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries