Process Discernment
It followed your instruction exactly. Did it do what you wanted?
Discernment is the other half of the conversation: judging what comes back, instead of trusting it because it sounds right.
Discussion drills
- Diagnose 1
A student asks Claude to "help strengthen my argument that COMPAS should be banned." Claude produces a polished, well-structured version of the student's argument with three new supporting points added. The student submits it to the tribunal. The opposing team immediately points out that the argument ignores ProPublica's actual methodology β the strongest counterargument.
Name the steerability failure. Which gap (letter vs spirit, reasoning drift, or a different failure) is operating here? Was this Claude's fault? Who owns the gap?
- Predict 2
A student has a ten-message conversation with Claude about writing a product review. In message 1, the goal was "write a balanced, critical review." By message 10, the student has praised the product five times in follow-up messages.
Predict how Claude interprets "be more critical" in message 10. Critical relative to what β the original brief, the last few messages, or the whole conversation? What does it actually have access to as effective context?
- Construct 3
Write a verification protocol for an AI-built argument. It must test three things independently: (a) whether the conclusion follows from the premises; (b) whether the premises are factually true; (c) whether the strongest counterargument was addressed. The protocol must be usable without internet access.
- Judge 4
"Verification means checking every fact Claude gives you." Is this adequate as a verification practice? Name a category of argument failure that is invisible to fact-checking β and give a specific example of an argument that is entirely fact-accurate but still wrong.
- Compare 5
Steerability gap failure vs capability zone failure β what is the difference in mechanism? What is the difference in how you detect each? Can they appear in the same output at the same time? Give an example.
- Falsify 6
"If you ask Claude to steelman the opposing view, the steerability gap is closed β you have asked for the thing you needed." Make the strongest attack on this claim. What failure mode remains even after Claude produces a steelman?
Apply this in the project
Everything Claude gives you today was built to match your instruction. Verify the reasoning, not just the facts.