Capabilities & Limits /capabilities-and-limits--ai-theory

Platform Awareness

1 / 14

You wouldn't trust a colleague you'd never tested

It's extraordinary at some things, unreliable at others β€” and it sounds exactly the same either way. Today you learn to tell the two apart.

Discuss

Name one task you'd happily hand it, and one you wouldn't trust it with. What's the difference?

Teacher note

Source: 3b-transcript. Open on the 'new colleague' frame β€” it runs through the whole lesson. Let the tension sit before the body reveals.

Discussion drills

  1. Diagnose 1

    Claude produces this: "Kim et al. (2023) found that 67.4% of Korean high school students experienced AI-related academic dishonesty in the 2022–23 academic year." The paper does not exist. The statistic is fabricated.

    Diagnose this failure. Which of the three failure types is it? Why did the percentage look real β€” why 67.4% and not "most"? Why did the authors sound real? Trace it to the token-level mechanism.

  2. Predict 2

    You ask Claude about an event that happened three months ago. It gives a detailed, confident answer. Name the two independent mechanisms that could make this answer wrong. Can you distinguish between them from the outside? What would tell you which one is operating?

  3. Construct 3

    Design a three-step verification protocol for checking a factual claim from Claude. Constraint: the protocol must work without internet access. Build it. Then state its limits β€” what class of error does it still miss?

  4. Judge 4

    Someone argues: "For creative writing, capability zones don't matter β€” it's all made up anyway." Evaluate this. Name at least one case where a capability zone failure materially affects creative output quality, and one where the claim holds.

  5. Falsify 5

    "If you ask Claude to say 'I don't know' when it is uncertain, the limitation zone problem is solved." Make the strongest attack on this claim.

  6. Transfer 6

    You are reviewing an AI-drafted legal contract. Three clauses have been inserted by Claude: (a) a standard indemnification clause for a UK commercial contract; (b) a data processing clause referencing GDPR Article 28; (c) a clause about liability for AI-generated content under "Section 47 of the Digital Services Act 2024."

    Apply the capability zone model to each clause. For each one, state your prior on coverage quality and what you would do before relying on it. Which clause demands the most immediate verification and why?

Apply this in the project

Every claim you rely on in today's brief is either in the capability zone or the limitation zone. The brief does not tell you which.