Agent Ethics

design-refusal-boundaries · case

Aim

Learn to draw and defend refusal boundaries, configuring guardrails under pressure, to produce a refusal policy.

Operation
position
Deliverable
refusal-policy

In one line

Draw JARVIS's refusal boundaries — define what it won't do and defend your guardrails.

Stage run

#StageTypeAimMin
1Agent EthicsarcadeMake a moral call under the clock before you reason about who is liable.
2Agent Ethicsdossier
3Guardrails & RefusalworkbenchLearn to design guardrails and refusal pairs that decide when your agent must not act, to keep your penpal safe with strangers, in the context of bounding what it will and won’t do at the booth.
4The SwitchlarpCan you write refusal boundaries precise enough to stop five real agent failures—and defend each rule under stress-test pressure?

Evidence artefact

Refusal policy (baked into the persona)