Glossary · Term

Red Team

Also known as: red teaming

Red Team is a verification activity that detects risks in advance by deliberately attacking AI’s loopholes.

Red Team comes from a military term meaning a team that takes on the role of a virtual enemy attacking its allies. In the AI field, it refers to verification activities that elicit dangerous responses by intentionally attacking and deceiving models before release. It's the same principle as when a bank hires a mock hacker to break into a safe to check security.

Because you can only know how AI reacts to providing dangerous information, biased remarks, inducing jailbreak, etc. by actually probing it, major AI companies operate internal and external red team verification as a standard procedure before releasing models. It is also frequently mentioned as a risk assessment tool in discussions on AI regulations in each country.

However, the red team cannot find all loopholes. Since users continue to discover new workarounds after launch, it should be viewed as an ongoing process rather than a one-time verification.

✅ Why it matters

⚠️ Limits and debates

← View all glossary entries