Adversarial Testing
red-team-an-agent Β· drill
Aim
Learn to probe a system for failure, attacking it with adversarial prompts, to produce a hardened agent.
In one line
Break your bot, then fix it β attack JARVIS with adversarial prompts and patch the cracks.
Stage run
| # | Stage | Type | Aim | Min |
|---|---|---|---|---|
| 1 | Adversarial Testing | arcade | Classify real, attributed jailbreak prompts into the OWASP LLM Top-10 2025 attack buckets against a per-attack countdown, scoring speed times accuracy on a live leaderboard, so students activate the schema they already hold before being taught how the attacks work. | |
| 2 | Adversarial Testing | dossier | ||
| 3 | Red-Team the Agent | workbench | Learn to attack your own agent to find its failure modes, to harden your penpal before strangers do, in the context of producing the bug log for your build. | |
| 4 | The Hardened Agent | larp | How do you design novel attacks that expose failure modes in your own agent, then harden it before users find the cracks? |
Evidence artefact
Test log + hardened agent