Adversarial Testing

red-team-an-agent · drill

Aim

Learn to probe a system for failure, attacking it with adversarial prompts, to produce a hardened agent.

Break your bot, then fix it — attack JARVIS with adversarial prompts and patch the cracks.

#	Stage	Type	Aim
1	Adversarial Testing	arcade	Classify real, attributed jailbreak prompts into the OWASP LLM Top-10 2025 attack buckets against a per-attack countdown, scoring speed times accuracy on a live leaderboard, so students activate the schema they already hold before being taught how the attacks work.
2	Adversarial Testing	dossier
3	Red-Team the Agent	workbench	Learn to attack your own agent to find its failure modes, to harden your penpal before strangers do, in the context of producing the bug log for your build.
4	The Hardened Agent	larp	How do you design novel attacks that expose failure modes in your own agent, then harden it before users find the cracks?

Test log + hardened agent