Red Team
Red Team is a verification activity that detects risks in advance by deliberately attacking AI’s loopholes.
Red Team comes from a military term meaning a team that takes on the role of a virtual enemy attacking its allies. In the AI field, it refers to verification activities that elicit dangerous responses by intentionally attacking and deceiving models before release. It's the same principle as when a bank hires a mock hacker to break into a safe to check security.
Because you can only know how AI reacts to providing dangerous information, biased remarks, inducing jailbreak, etc. by actually probing it, major AI companies operate internal and external red team verification as a standard procedure before releasing models. It is also frequently mentioned as a risk assessment tool in discussions on AI regulations in each country.
However, the red team cannot find all loopholes. Since users continue to discover new workarounds after launch, it should be viewed as an ongoing process rather than a one-time verification.
✅ Why it matters
- Dangerous reactions and loopholes in AI can be discovered before launch
- It has become the industry standard procedure for AI safety verification
- These are the basic concepts needed to understand AI regulations and safety news
⚠️ Limits and debates
- It is impossible to discover all attack paths in advance
- Verification results are often not transparently disclosed to the outside world
- Fixing discovered problems may have the side effect of diminishing other capabilities