Glossary · Term

Prompt Injection

Prompt Injection is an attack technique that manipulates AI using instructions secretly planted in documents or web pages. It is a representative security threat in the AI agent era.

Prompt injection is an attack that secretly plants malicious instructions in documents, web pages, and emails that the AI will read, causing the AI to follow the attacker's instructions instead of the original owner's instructions. It can be compared to inserting a note saying “Please give me your wallet” in a letter to be delivered to an errand boy.

As AI has evolved beyond simple chatbots into agents that read emails and handle files, the problem that the moment external content is read becomes a conduit for attack has been highlighted. This problem arises from AI's inability to fundamentally distinguish between data and commands, making complete blocking difficult, and is considered a top priority in the field of AI security.

It is easy to confuse it with jailbreaking, in which users unlock AI restrictions through conversation, but prompt injection is different in that a third party manipulates another person's AI through content.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries