Vibe Coding: Building with AI Codegen
Vibe Coding
"Vibe coding" β the practice of describing what you want and letting AI write the implementation β has generated both the most optimistic and most alarmed coverage in tech journalism this year. The reality documented in controlled studies and post-mortems is more complex: dramatic productivity gains for experienced engineers, significant security and correctness problems for novice users who cannot audit the output.
GitHub Report: Developers Using Copilot Ship 55% More Code Per Week β Security Audit Finds 40% Higher Vulnerability Rate
GitHub's annual Copilot productivity report showed users merging 55% more pull requests per week; an independent Carnegie Mellon audit of a matched cohort's codebases found a 40% higher rate of OWASP Top 10 vulnerabilities compared to non-Copilot developers with equivalent experience. Both institutions confirmed both numbers.
"More code. More problems. The ratio is roughly the same."
Stack Overflow Traffic Down 48% Year-Over-Year β CEO Attributes Decline Entirely to AI Coding Assistants
Stack Overflow's monthly traffic report showed a 48% year-over-year decline in question views, with the CEO's investor letter directly attributing the fall to AI coding assistants substituting for search-based problem-solving. Stack Overflow announced a partnership to use its content to train AI systems β the same systems credited with its traffic collapse.
"The company is training the thing that is replacing it."
US NIST Issues Advisory: AI-Generated Code Should Not Be Used in Critical Infrastructure Without Formal Verification
The US National Institute of Standards and Technology issued guidance advising against the use of AI-generated code in critical infrastructure systems β power grids, water treatment, financial clearing β without formal static analysis and penetration testing. The advisory followed three documented incidents where AI-generated authentication code contained exploitable vulnerabilities.
16-Year-Old Builds Working Bank Application Using Only Claude β Security Researcher Finds 7 Critical Vulnerabilities in 20 Minutes
A profile of a teenager who built a functioning online banking demo using Claude without prior programming experience was followed by an appendix from a security researcher who, given access to the same codebase, identified seven critical vulnerabilities including SQL injection, unsalted password storage, and an unauthenticated admin endpoint.
"It works. Seven ways it doesn't."
If AI-assisted coding tools dramatically increase developer productivity but also increase the rate of security vulnerabilities shipped, is the net effect positive or negative β and for whom?
Reading
The Spec-to-Ship Loop
Shipping a feature means turning words into code. Most junior devs think this happens in one direction: write spec β AI generates code β ship it. That's a fantasy. The real loop has three moves: specify, generate, correct β and the cycle often repeats.
Here's the hard truth: your spec is always incomplete. Not because you're bad at writing, but because software lives in contradictions. Every feature has unstated assumptions β edge cases, interaction details, performance tradeoffs, integration points with systems you haven't fully understood yet. The AI can fill some of these in. It will also invent things. Your job is to catch both.
Why specs fail
Take a real example: "Add a toggle to show/hide the leaderboard." Sounds simple. Here's what's unstated:
- Where does the toggle live? (Top bar? Settings? Context menu?)
- What's the default state? (Shown or hidden?)
- When you hide the leaderboard, do score notifications still fire?
- If a user has never seen the leaderboard, should the toggle text say "Show" or "Reveal"?
- Does hiding it clear the live-update websocket subscription?
An AI will pick sensible defaults for every one. They might not be your defaults. The spec β generation β correction cycle is how you align them.
The three moves
Move 1: Specify. Write enough detail that an AI can start. Don't exhaustively document β you can't. Instead, pick the critical constraints: what data flows through this feature, what triggers it, what the success metric is. For the leaderboard toggle: "Add a toggle in the top-right nav bar, default hidden. When toggled, show/hide the leaderboard view. Keep the websocket alive." That's 20 seconds to write and it closes 80% of the ambiguity.
Move 2: Generate. Run the spec through an AI. Tell it your tech stack, your file structure, any libraries you're using. Ask it to generate the code and explain its assumptions β what defaults did it pick, why it put the toggle where it did, what it skipped. This is the critical move: read the assumptions, don't just paste the code.
Move 3: Correct. For every assumption that doesn't match your intent, tell the AI. "Actually, the toggle should default to shown, not hidden." "The notifications should pause when the leaderboard is hidden." Each correction spawns a new generation. You're not writing code β you're refining a spec, one correction at a time.
When to stop
You stop when either:
- The generated code matches your intent well enough to land it (ship it to staging, test it, merge it).
- The corrections have become so intricate that it's faster to hand-code the piece yourself.
Neither of these is failure. Both are victories β you've clarified what you want and you've shipped something.
The vibe check
All three moves depend on one thing: you have to read what the AI generated and compare it to what you meant. That comparison is the whole skill. The AI doesn't know your team's code style, your user's unspoken expectations, or the difference between a sensible default and a trap. You do. Generating code is the easy part. Vibe-coding is the reading-and-correcting part β and that's where the feature actually ships.
DOSSIER: VIBE CODING
Code at the Speed of Thought
Ship a feature with AI codegen without shipping lies.
By the end of this dossier you will be able to write a spec that an AI can start from, catch the assumptions baked into its output, correct them one by one, and land a working feature β because you read what it generated instead of just pasting it.
# Example answer Assumption | AI's Likely Default | Outcome if wrong ---|---|--- Where does the search bar go? | Top of the page, before other content | Users might expect it in a nav bar or sidebar β confuses information hierarchy Does it search as you type, or on Enter? | Probably on Enter (fewer requests) | Users expect instant results like Google β feels laggy and broken What happens if no results match? | Shows empty state with "No activities found" | Might need special messaging for zero state (first-time users), confuses if search has bugs Does search filter *all* activities or only visible ones? | Searches all (broader default) | Could expose activities that shouldn't be discoverable yet β breaks progression or permissions Does it remember recent searches? | Probably not | Users expect browser/device memory β feels dumb
Write Spec
Write a Vibe-Code Spec
Pick a small feature from a real app you use (or imagine one). Write a spec that an AI could use to generate code. Your spec should:
- Name the feature in one sentence. (e.g., "Add a button that toggles dark mode.")
- List 3β5 key constraints: the data that flows through, what triggers it, what the user sees. Don't try to exhaustively document β just close the biggest ambiguities.
- Note one assumption you're making. What default will an AI probably pick that you might need to correct? Why might it matter?
- State the success metric. How will you know this feature is done and working?
Keep it to ~150β200 words. Remember: a spec is a starting point, not a blueprint. It's done when an AI could generate something you'd want to correct, not when every detail is decided.