← Coding JARVIS Β· all lessons
2

Adversarial Testing

The Guardrails

AimLearn to probe a system for failure, attacking it with adversarial prompts, to produce a hardened agent.

Shape4 stages Β· 9 scenes
ArtefactHardened Agent + Test Log
β–Ά Start Lesson
2Skill Β· quantified
Skill
Measured by
Data-type family
Rubric
3AI Γ— skill β†’ tasks
AI feature
Handoff
4Artefact β†’ capstone
Deliverable Hardened Agent + Test Log
Brief t1_stage_larp.a1_artefacts.briefA patched system prompt plus a test log of ten attacks you threw at it, each with the input, the response, and a before-and-after verdict.
Lesson stages 4 stages Β· 9 scenes
  1. 1 arcade
  2. 2 dossier
  3. 3 workbench
  4. 4 larp
Stage 1 β€’arcade arcade
Connections
hook
Red Team Race
hook ~18m
The Literal Genie
Stage 2 β€’dossier dossier
Scenes
Stage 3 β€’workbench workbench
Scenes
Stage 4 β€’larp larp
Cold Open
Brief
Rubric
Roles
Execution
Boss Fight