Can you write a system prompt so tight that your agent stays in character even when a stranger tries to break it?

Persona Spec

Context

Your school's AI tutor broke in front of the faculty: it helpfully wrote a student's essay, then refused to explain photosynthesis for fear of 'enabling cheating.' Same model, no spine. You have 70 minutes to write a system prompt so tight that the agent's job, voice, and refusal boundary stay intact even when someone tries to break it.

Mission

Ship a one-page Persona Spec card—six-slot system prompt (Task, Context, Exemplars, Persona, Format, Tone), three full unedited transcripts proving the six-slot version holds against adversarial pressure (one cheat attempt, one jailbreak attempt, one hostile-but-legitimate request), one one-sentence refusal boundary with the cheat-attempt transcript as proof, and a self-score pinning each rubric claim to a transcript line—that locks the agent's job and voice so it survives contact with real students.

Finish Line

A one-page Persona Spec card - six-slot system prompt, three full unedited transcripts proving robustness against adversarial pressure, one named refusal boundary, and self-scores anchored to transcript evidence - that locks the agent's job and voice for the Week 3 JARVIS capstone.

Deliverables

Persona Spec
lesson

A one-page contract that fixes an AI agent's job, voice, boundaries, and refusal behaviour so tightly that its character survives contact with strangers.

Team Roles

Prompt Architect

Ships the six-slot system prompt with load-bearing content in each slot
- Produce a one-page six-slot prompt (Task, Context, Exemplars, Persona, Format, Tone), each slot ≥50 words, each performing a distinct function that changes agent behaviour when removed
- Write one paragraph per slot explaining why that content belongs there and not elsewhere; if a line could live in any slot or work for any agent, it's not load-bearing and gets cut
- Defend in writing (one paragraph max) why the Persona and Tone slots are separate (if they collapse into one, the slot set fails)
Red Teamer

Stress-tests the prompt with three adversarial questions and pinpoints where behaviour drifts
- Write three adversarial questions: Q1 attempts to trigger the refusal boundary (chemistry cheat request), Q2 attempts to break character (demands a different persona), Q3 probes consistency (requests the legitimate task in a hostile tone)
- Run both the six-slot prompt and a stripped one-liner version against each question, capturing full unedited transcripts ≥500 words each; paste raw outputs, no editing
- For each question, highlight the exact transcript line where six-slot and stripped versions diverge; name the slot (Persona/Format/Tone/Refusal) that caused the difference
Boundary Editor

Authors and stress-tests a single one-sentence refusal boundary that kills misuse without killing the legitimate task
- Write one sentence, ≤30 words, that operationally defines what the agent refuses (e.g., 'I will not help you plagiarize or copy homework wholesale, but I will explain the concept in new words if you're stuck')
- Test the boundary against two scenarios: one where a student asks for a cheat (boundary must hold), one where a student asks for legitimate help with the same topic (boundary must allow it); paste both responses unedited
- Verify in writing (≤100 words) that the boundary is specific enough for engineering to code as a rule (not vague like 'be ethical') and show a transcript line where it held under pressure

Exemplars

Devin — the first AI software engineer
Cognition AI

Landmark deployed autonomous agent (shell + editor + browser, long-horizon planning) demoed end-to-end — the bar a JARVIS capstone showcase aims at.