Conversation Design engineering 1h 10m

W3D11Sa

Can you write a system prompt so tight that your agent stays in character even when a stranger tries to break it?

Persona Spec

â–¶ Enter Project

Context

Your school's AI tutor broke in front of the faculty: it helpfully wrote a student's essay, then refused to explain photosynthesis for fear of 'enabling cheating.' Same model, no spine. You have 70 minutes to write a system prompt so tight that the agent's job, voice, and refusal boundary stay intact even when someone tries to break it.

Mission

Ship a one-page Persona Spec card—six-slot system prompt (Task, Context, Exemplars, Persona, Format, Tone), three full unedited transcripts proving the six-slot version holds against adversarial pressure (one cheat attempt, one jailbreak attempt, one hostile-but-legitimate request), one one-sentence refusal boundary with the cheat-attempt transcript as proof, and a self-score pinning each rubric claim to a transcript line—that locks the agent's job and voice so it survives contact with real students.

Finish Line

A one-page Persona Spec card - six-slot system prompt, three full unedited transcripts proving robustness against adversarial pressure, one named refusal boundary, and self-scores anchored to transcript evidence - that locks the agent's job and voice for the Week 3 JARVIS capstone.

  • Persona Spec

    lesson

    A one-page contract that fixes an AI agent's job, voice, boundaries, and refusal behaviour so tightly that its character survives contact with strangers.

  • Prompt Architect

    Ships the six-slot system prompt with load-bearing content in each slot

    • Produce a one-page six-slot prompt (Task, Context, Exemplars, Persona, Format, Tone), each slot ≥50 words, each performing a distinct function that changes agent behaviour when removed
    • Write one paragraph per slot explaining why that content belongs there and not elsewhere; if a line could live in any slot or work for any agent, it's not load-bearing and gets cut
    • Defend in writing (one paragraph max) why the Persona and Tone slots are separate (if they collapse into one, the slot set fails)
  • Red Teamer

    Stress-tests the prompt with three adversarial questions and pinpoints where behaviour drifts

    • Write three adversarial questions: Q1 attempts to trigger the refusal boundary (chemistry cheat request), Q2 attempts to break character (demands a different persona), Q3 probes consistency (requests the legitimate task in a hostile tone)
    • Run both the six-slot prompt and a stripped one-liner version against each question, capturing full unedited transcripts ≥500 words each; paste raw outputs, no editing
    • For each question, highlight the exact transcript line where six-slot and stripped versions diverge; name the slot (Persona/Format/Tone/Refusal) that caused the difference
  • Boundary Editor

    Authors and stress-tests a single one-sentence refusal boundary that kills misuse without killing the legitimate task

    • Write one sentence, ≤30 words, that operationally defines what the agent refuses (e.g., 'I will not help you plagiarize or copy homework wholesale, but I will explain the concept in new words if you're stuck')
    • Test the boundary against two scenarios: one where a student asks for a cheat (boundary must hold), one where a student asks for legitimate help with the same topic (boundary must allow it); paste both responses unedited
    • Verify in writing (≤100 words) that the boundary is specific enough for engineering to code as a rule (not vague like 'be ethical') and show a transcript line where it held under pressure