Adversarial Testing

red-team-an-agent Β· drill

Aim

Learn to probe a system for failure, attacking it with adversarial prompts, to produce a hardened agent.

Operation
branch
Deliverable
hardened-agent

In one line

Break your bot, then fix it β€” attack JARVIS with adversarial prompts and patch the cracks.

Stage run

#StageTypeAimMin
1Adversarial TestingarcadeClassify real, attributed jailbreak prompts into the OWASP LLM Top-10 2025 attack buckets against a per-attack countdown, scoring speed times accuracy on a live leaderboard, so students activate the schema they already hold before being taught how the attacks work.
2Adversarial Testingdossier
3Red-Team the AgentworkbenchLearn to attack your own agent to find its failure modes, to harden your penpal before strangers do, in the context of producing the bug log for your build.
4The Hardened AgentlarpHow do you design novel attacks that expose failure modes in your own agent, then harden it before users find the cracks?

Evidence artefact

Test log + hardened agent