Prompt Injection
Prompt Injection is an attack technique that manipulates AI using instructions secretly planted in documents or web pages. It is a representative security threat in the AI agent era.
Prompt injection is an attack that secretly plants malicious instructions in documents, web pages, and emails that the AI will read, causing the AI to follow the attacker's instructions instead of the original owner's instructions. It can be compared to inserting a note saying “Please give me your wallet” in a letter to be delivered to an errand boy.
As AI has evolved beyond simple chatbots into agents that read emails and handle files, the problem that the moment external content is read becomes a conduit for attack has been highlighted. This problem arises from AI's inability to fundamentally distinguish between data and commands, making complete blocking difficult, and is considered a top priority in the field of AI security.
It is easy to confuse it with jailbreaking, in which users unlock AI restrictions through conversation, but prompt injection is different in that a third party manipulates another person's AI through content.
✅ Why it matters
- This is a representative security threat that you must be aware of before introducing an AI agent
- It is a standard for judging the risk of giving sensitive permissions to AI
- It is an essential consideration when designing an AI function that reads external documents
⚠️ Limits and debates
- It is difficult to completely defend because it is a structural problem that does not distinguish between data and commands
- Attack text can be hidden from view, making detection difficult
- As the agent's authority increases, the scale of damage also increases.