Glossary · Term

Jailbreak

Jailbreak is the act of bypassing AI’s safety devices and eliciting prohibited answers through clever input.

Jailbreaking is the act of bypassing AI's safeguards with cleverly designed inputs, eliciting responses that should have been rejected. For example, by asking them to play a situation using the lines of a villain in a novel, or by twisting the instructions into several layers and making them say something they would refuse if asked directly.

This naturally emerged in the process of testing how solid AI's guardrails are, and security researchers engage in red team activities that intentionally attempt jailbreaks to find vulnerabilities in advance. AI companies repeatedly patch patches to block discovered methods, but it is a battle of spear and shield, with new workarounds continuing to emerge.

Jailbreak shows that AI safety is an ongoing challenge that does not end with a single design. If a corporate chatbot is jailbroken and makes false promises or inappropriate remarks, it can lead to real losses, so it is no stranger to companies adopting AI.

✅ Why it matters

Helps you understand the limitations of AI safety devices and how they work
Informs you of the security risks that companies need to check when introducing chatbots
Explains why AI security industries such as Red Team exist

⚠️ Limits and debates

If abused, it leads to leakage of risky information and misuse of services
It is difficult to fundamentally solve the problem as blocking patches and new workarounds are repeated
There is an argument that the line between research purposes and malicious attempts is blurry