It was the first Monday of the quarter and I was doing the thing carpenters do in January, cleaning the bench. In my case, the bench is a file called prompts.md, and after eighteen months of using vibe coding prompts daily it had grown to four hundred and twelve lines of half-remembered incantations, abandoned experiments, and one very specific prompt I wrote at 3 AM that just said "please stop renaming my variables."
I sat down with coffee and started cutting. Not adding. Cutting. By lunch I had thirty-seven left. By the end of the week I had fifty. And here is the part that humbled me: when I cross-referenced those fifty vibe coding prompts against my actual git history for the previous quarter, eight of them accounted for roughly 80% of the real work I had shipped. The rest were variations, edge cases, or vanity.
That is the Pareto epiphany this whole post is built around. Most of the vibe coding prompts you actually need are small, repeatable, and boring. The flashy ones are mostly performance art.
So here are the fifty prompts I kept. These are the vibe coding prompts I run weekly, sometimes daily, against Claude Code and the various agents I keep on rotation. Each one is field-tested. Each one earned its place by surviving real production work. None of them are clever. All of them work.
If you want the broader Claude Code mental model first, my 50 Claude Code tips post is the right warmup. If you are still figuring out what vibe coding even is, start there and come back. And if you have not yet written a real CLAUDE.md, my partner Sofia and I have very strong opinions in the definitive guide to CLAUDE.md rules files. Context plus prompts is the whole game. Prompts without context is just yelling.
How to read this post: each prompt has three parts. The template itself, in a fenced block, with [BRACKETS] for variables you replace. Why it works, in one or two sentences. And what you actually get back, which is the honest description of typical output and where it tends to fail. Copy what works. Ignore what does not.
A note before we start. Anthropic's prompt engineering docs are still the best primer if you want the theory. What follows is the field manual.
Scaffolding (Prompts 1-8)
Scaffolding is where most people start, and it is also where most people get the worst results. If you are brand new to this workflow, our beginner's guide to building your first app with vibe coding walks through the full loop before you worry about prompt templates. The reason most people get bad results is that they ask the model to "build a thing" and the model, eager to please, builds a thing, just not theirs. Every prompt in this section trades freedom for specification. Boring is the point.
I keep my scaffolding prompts close to my persistent context setup. The CLAUDE.md handles taste. The prompt handles intent. They meet in the middle.
Prompt 1: New project
Scaffold a new [LANGUAGE/FRAMEWORK] project called [NAME].
Stack: [STACK CHOICES, e.g., Next.js 16, Tailwind v4, Prisma, Postgres].
Constraints: minimal dependencies, no boilerplate, src/ layout, strict tsconfig.
Output a single tree of files with paths and full contents.
Include: README with run/build/test commands, .gitignore, .env.example, CI stub.
Do NOT install anything. Print only.
Why it works: The "print only" rule is load-bearing. It forces the model to commit to a complete picture before any side effects. You read, you adjust, then you say "go." Treats the first pass like a blueprint, not a build.
What you get back: A neat file tree, usually 8 to 14 files, with surprisingly sane defaults. Common failure: it sometimes invents package versions that do not exist. Always run npm view [package] on anything unfamiliar before installing.
Prompt 2: New component
Add a new [FRAMEWORK] component called [NAME] in [PATH].
Purpose: [ONE SENTENCE OF WHAT IT DOES].
Props: [LIST OR "INFER FROM USAGE"].
Style: match the existing components in [SIBLING PATH].
Include: TypeScript types, default props, a basic story or example, no tests yet.
Show me the diff before writing files.
Why it works: "Match the existing components in [SIBLING PATH]" is the single most underrated phrase in vibe coding. It anchors the model to your codebase's conventions instead of a generic StackOverflow average.
What you get back: A component that mostly looks like its neighbors. Common failure: it imports from react even when your codebase uses a custom re-export. Diff review catches this in five seconds.
Prompt 3: New endpoint
Add a new [METHOD] [ROUTE] endpoint in [FRAMEWORK].
Input: [SCHEMA OR EXAMPLE PAYLOAD].
Output: [SCHEMA OR EXAMPLE RESPONSE].
Auth: [PUBLIC | SESSION | API KEY].
Validation: use [LIBRARY, e.g., Zod] with strict parsing.
Errors: return structured JSON with an error code, never leak stack traces.
Write the route handler, the validation schema, and one happy-path test.
Why it works: Every constraint here is a security cliff someone has already fallen off. The prompt encodes those lessons so you do not have to remember them at 11 PM on a Friday.
What you get back: A handler that respects your envelope and validates inputs. Common failure: it sometimes forgets to await an async validator. Read the diff.
Prompt 4: New database schema
Add a new table called [NAME] to my [DATABASE] schema.
Fields: [LIST WITH TYPES AND CONSTRAINTS].
Relations: [PARENT/CHILD TABLES].
Indexes: [QUERY PATTERNS THAT NEED TO BE FAST].
Migrations: generate a forward and a reverse migration.
Naming: snake_case columns, plural table names, [ID STRATEGY: cuid | uuid | bigserial].
Do not modify any other table without asking first.
Why it works: The "do not modify any other table" line is the seatbelt. Without it, models cheerfully cascade-rename half your schema in the name of consistency.
What you get back: Clean migrations with a reverse path. Common failure: occasionally generates an index on a column you said was nullable, which silently does the wrong thing in some engines. Spot check.
Prompt 5: New test file
Create a test file for [MODULE PATH] using [TEST FRAMEWORK].
Cover: the public API only, not internals.
Style: AAA (Arrange, Act, Assert), one assertion per test, descriptive names.
Mocks: only mock the boundary I name: [BOUNDARY, e.g., the Stripe SDK, the database].
Start with a single failing test for the most important behavior. We will add more after I review.
Why it works: "Start with a single failing test" is TDD smuggled inside a prompt. The model wants to write twelve tests at once. You want it to write the most important one.
What you get back: One focused test that fails for the right reason. Common failure: it sometimes mocks too aggressively and ends up testing the mock. Re-read the test as if you were a stranger.
Prompt 6: New feature flag
Add a feature flag called [FLAG_NAME] for [WHAT IT GATES].
Default: off in all environments.
Surface: [SERVER ONLY | CLIENT ONLY | BOTH].
Reader: read it through our existing [FLAG SERVICE OR ENV VAR PATTERN].
Cleanup TODO: add a comment with today's date and a 30-day expiry note.
Wire it into [TARGET CODE PATH] with the smallest possible change.
Why it works: Feature flags rot. The expiry comment is a forcing function. Older me has thanked younger me for this exactly seventeen times.
What you get back: A flag wired into one code path with a TODO that future you will actually find with grep. Common failure: forgets to default-off in test environments. Read the test config.
Prompt 7: New migration
Generate a [DATABASE] migration that does the following:
[PLAIN ENGLISH DESCRIPTION OF THE CHANGE].
Constraints:
- Backwards-compatible with the currently deployed app version.
- No locking operations that block writes for more than 1 second.
- Include a reverse migration.
- If the change is destructive, split into a multi-step plan and pause for my approval between steps.
Why it works: The "pause between steps" line saves data. Migration prompts that go full speed ahead are how outages happen.
What you get back: Either a clean migration or a multi-step plan with a "ready for step 2?" prompt. Common failure: it sometimes ignores the lock budget on Postgres ALTER TABLE operations. Verify with EXPLAIN.
Prompt 8: New microservice
Scaffold a new microservice called [NAME].
Responsibility: [ONE SENTENCE, NO MORE].
Interface: [HTTP | GRPC | EVENT BUS] with this contract: [SCHEMA].
Dependencies: [DBs, queues, other services].
Observability: structured logs, a /health endpoint, basic Prometheus counters.
Deployment: a Dockerfile, a docker-compose entry, and a README.
No business logic in the scaffold. Just the skeleton.
Why it works: "Responsibility: one sentence, no more" is the test. If you cannot write one sentence, you do not have a service. You have a feature pretending to be a service.
What you get back: A skeleton that boots and responds to /health. Common failure: invents a port collision with another service in your compose file. Check your docker-compose.yml.
Debugging (Prompts 9-16)
Debugging prompts are different from scaffolding prompts. With scaffolding, you give the model a clear destination. With debugging, you give it a clear gap: here is what I expected, here is what I got, here is the evidence. The bigger the gap you can articulate, the smaller the search space the model has to walk.
I learned this the hard way after spending a week debugging an off-by-one error that turned out to live in a regex I had not even read yet. My prompts now front-load evidence ruthlessly.
Prompt 9: Reproduce a bug
I have a bug. Help me build the smallest possible reproduction.
Symptom: [WHAT THE USER SEES].
Conditions: [ENV, BROWSER, INPUT, TIMING].
Logs: [PASTE ANY RELEVANT LOG LINES].
Hypothesis: [MY BEST GUESS, OR "NONE YET"].
Walk me through a 5-step plan to reproduce in a unit test or a runnable script.
Do not fix anything yet.
Why it works: "Do not fix anything yet" is the leash. Models will jump to fixes before they understand the bug. Reproduction first, fix later.
What you get back: A short plan and usually a runnable script. Common failure: skips a step and assumes you have a fixture file you do not have. Ask for it explicitly.
Prompt 10: Narrow a stack trace
Here is a stack trace:
[PASTE FULL TRACE].
Codebase context: [LINK OR FILE PATHS THE TRACE TOUCHES].
What I changed recently: [LAST 1-3 COMMITS OR PR DESCRIPTIONS].
Tell me:
1. The most likely root cause, with the exact line you suspect.
2. The next 3 lines I should add `console.log` or breakpoints to.
3. Any reads in the surrounding code that might help confirm or rule out the hypothesis.
Why it works: Forcing the model to commit to a line, not a file, is how you avoid wishy-washy answers. It either nails it or you learn it does not have enough context.
What you get back: A specific suspect line and three places to instrument. Common failure: when the trace crosses a vendor boundary, it sometimes blames the vendor. Trust but verify.
Prompt 11: Why is this slow
This [FUNCTION | ENDPOINT | PAGE] is slow.
Measured: [BASELINE], [CURRENT], [TARGET].
Code: [PASTE OR REFERENCE PATH].
Inputs at the slow case: [DESCRIBE].
Walk through the suspected hot path. Identify the top 3 candidates for the bottleneck.
For each, propose a measurement (not a fix) I can run to confirm.
Why it works: Measurement before mutation. The temptation is to optimize first and measure later, which is how you make code uglier and slower. Reverse the order.
What you get back: A short triage with three measurements you can actually run. Common failure: picks micro-optimizations over algorithmic ones. Push back if it suggests you "use a for loop instead of map" before checking your N.
Prompt 12: This test is flaky
This test is flaky:
[PASTE TEST NAME AND CODE].
Failure rate: [ROUGH %].
Failure modes I have seen: [LIST].
Walk through the possible sources of nondeterminism in this test.
Rank them most-to-least likely. Suggest the smallest change that would prove or disprove the top suspect.
Why it works: Flaky tests almost always come from a small list of culprits: time, network, ordering, shared state. Ranking forces a model to think about the list rather than guess.
What you get back: A ranked list with a one-line experiment for the top suspect. Common failure: forgets that test ordering matters when you use beforeAll improperly. Mention your runner.
Prompt 13: Explain this error
I do not understand this error:
[PASTE ERROR].
Context: I was trying to do [GOAL] in [LANGUAGE/FRAMEWORK].
Explain the error in plain English first.
Then list the 3 most common causes, ordered from "almost always this" to "rarely but possible."
Then ask me one clarifying question before suggesting a fix.
Why it works: The "ask me one clarifying question" line forces dialogue. Models that immediately suggest a fix often suggest the wrong one.
What you get back: A plain-English explanation that doubles as documentation. Common failure: explanations sometimes drift toward generic textbook causes. Push for specificity.
Prompt 14: Race condition hunt
I suspect a race condition in [FILE OR FEATURE].
Symptom: [INTERMITTENT BEHAVIOR].
Concurrency model: [THREADS | EVENT LOOP | WORKERS | DISTRIBUTED].
Walk through every shared mutable state in this file.
For each, ask: who reads, who writes, what enforces ordering?
Flag the ones with no clear answer. Do not propose a fix until I confirm the suspect.
Why it works: Race conditions are won or lost in the audit. Forcing the "who reads, who writes, what enforces ordering" question on every shared piece of state is the whole technique.
What you get back: A table or list of shared state with risk flags. Common failure: misses races that cross file boundaries. Be explicit about which other files share the state.
Prompt 15: Memory leak hunt
This [PROCESS | PAGE | LONG-LIVED OBJECT] leaks memory.
Evidence: [HEAP SNAPSHOT, RSS GROWTH, OOM KILL].
Suspected start: [TIME OR EVENT].
Walk through the lifecycle of every long-lived reference in [SCOPE].
Specifically check: event listeners, timers, caches, closures, observer subscriptions.
Suggest the 3 most likely leaks, with the exact file and line if possible.
Why it works: Memory leaks live in a handful of categories. Naming the categories explicitly turns a vague hunt into a checklist.
What you get back: A ranked shortlist with likely files. Common failure: in JavaScript, it sometimes blames closures that are perfectly fine. Verify with a real heap snapshot.
Prompt 16: Regression bisect
A regression appeared between [GOOD COMMIT] and [BAD COMMIT].
Symptom: [DESCRIBE].
Suspected scope: [FILES OR FEATURE AREA].
Help me run a manual or automated `git bisect`.
Suggest a one-line repro command I can use as the bisect test.
For each commit it lands on, summarize the diff in plain English so I can decide good or bad.
Why it works: Bisect is one of the most underused tools in software. Wrapping it in a prompt removes the friction of remembering the syntax.
What you get back: A one-line repro and a willingness to summarize each commit. Common failure: it sometimes proposes a repro command that has side effects. Read before running.
Refactoring (Prompts 17-23)
Refactoring is woodworking. You are not building a new piece of furniture. You are sanding, re-jointing, replacing a worn dovetail. The goal is the same shape with better bones. Refactor prompts that do not respect the existing shape always end in tears.
Every refactor prompt I keep starts the same way: state the invariant. What must remain true after this change? If you cannot name the invariant in one sentence, you are not refactoring. You are rewriting.
Prompt 17: Extract function
Extract a function from [FILE]:[LINE RANGE].
Name: [PROPOSED NAME, OR "SUGGEST 3 OPTIONS"].
Invariant: behavior must be identical for all current callers.
Constraints: no new dependencies, same module, pure if possible.
Show me the diff. Then list any tests I should add or update before I commit.
Why it works: Extracting is mechanical, but the naming is everything. Asking for three options forces a small taste decision instead of a default.
What you get back: A clean diff with a sensible name. Common failure: occasionally widens function signatures when it should narrow them. Watch for unused parameters.
Prompt 18: Rename across codebase
Rename [OLD NAME] to [NEW NAME] across the codebase.
Scope: [SYMBOL TYPE: variable, function, type, class, file, route].
Affected files: [LIST OR "FIND THEM YOURSELF AND REPORT BACK"].
Constraints:
- Update imports, exports, tests, docs, and config.
- Skip strings unless they are clearly identifiers.
- Show me the file list before you make changes.
Why it works: The "skip strings unless they are clearly identifiers" rule prevents the classic catastrophic rename that touches every comment and changelog entry.
What you get back: A pre-flight file list, then a clean rename. Common failure: misses dynamic imports or string-keyed lookups. Search those manually.
Prompt 19: Kill duplication
I have duplication in [FILE A] and [FILE B] (and possibly elsewhere).
Identify the shared pattern.
Propose 2 options:
1. Extract to a shared module (where, with what name).
2. Leave it as is and tell me why duplication might be the right call here.
Do not refactor yet. Recommend one option with a one-paragraph justification.
Why it works: Sometimes duplication is right. Forcing the "leave it" option to be on the table prevents premature DRYing, which I have personally regretted more times than I have benefited from.
What you get back: A reasoned recommendation, often with a name for the new abstraction. Common failure: it sometimes invents a "utility" home that does not match your project's structure. Suggest the directory.
Prompt 20: Untangle a god class
[CLASS NAME] in [FILE] has grown to [N] methods and [N] responsibilities.
Help me untangle it without breaking callers.
Step 1: List its responsibilities, grouped by cohesion.
Step 2: Propose 2-3 smaller classes or modules with names and responsibilities.
Step 3: For each, show the migration path: which methods move, which stay, which become a shim.
Do not write code yet. Just the plan.
Why it works: God classes are political. The plan-first approach lets you negotiate with yourself (or your team) before any code moves.
What you get back: A staged migration plan you can review. Common failure: sometimes proposes to split along technical lines (data vs. logic) when domain lines would be better. Push back.
Prompt 21: Decompose a long file
[FILE] is [N] lines long and is doing too much.
Identify natural seams (groups of related functions, types, or constants).
Propose a decomposition into 2-4 files.
For each, suggest a name, the contents that move there, and the imports that need updating.
Show me the plan as a tree before any file is created.
Why it works: "Show me the plan as a tree" makes the abstract decomposition concrete and reviewable. You can spot a bad name at a glance.
What you get back: A small file tree with content lists. Common failure: occasionally proposes a utils.ts catchall, which is just god class with extra steps. Reject and ask for a specific name.
Prompt 22: Modernize legacy patterns
This code uses [LEGACY PATTERN, e.g., callbacks, class components, var].
Modernize it to [TARGET PATTERN, e.g., async/await, function components, const/let].
Constraints:
- Behavior must be identical.
- No new dependencies.
- Keep the public API stable.
- Highlight any cases where the legacy pattern was actually doing something the modern one cannot.
Show me a diff.
Why it works: The "highlight cases where the legacy pattern was doing something the modern one cannot" line is the safety net. Sometimes var hoisting or callback ordering was load-bearing.
What you get back: A modernized diff with a few flagged "watch out" notes. Common failure: silently changes error handling semantics. Read the catch blocks.
Prompt 23: Simplify a complex condition
This conditional is hard to read:
[PASTE THE CONDITION OR THE WHOLE FUNCTION].
Refactor for clarity. Options:
- Early returns (guard clauses).
- Named boolean variables.
- Lookup tables / strategy pattern.
- Extracted predicate functions.
Recommend one approach with a justification, then show the refactored code with tests if any exist.
Why it works: Listing the menu of refactor moves up front teaches the prompter and the model. It also avoids the "make it shorter" instinct that often makes things worse.
What you get back: One readable version of the same logic. Common failure: sometimes hides the early returns behind a wrapper function for no reason. Push for the simplest form.
Sofia here: the test prompts that saved me twice
Hey, Sofia jumping in. James has the carpentry analogies. I keep reaching for the orchestra. Tests are the rehearsal, not the performance, and the prompts in this section are the ones that turned my rehearsal process from "play it loud and hope" into something closer to actual stagecraft.
I came to this from product management, which means I learned to write tests after I learned to write features, which is exactly the wrong order. The prompts below are the ones that finally taught me the right order. Two of them have saved my bacon in production, and I will tell you which ones as we go.
Prompt 24: Write the failing test first
I want to add this behavior to [MODULE]:
[ONE-PARAGRAPH DESCRIPTION].
Before any implementation, write the smallest test that would fail today and pass once the behavior exists.
Constraints:
- Test the behavior, not the implementation.
- One assertion if possible.
- Use the same testing style as [NEIGHBORING TEST FILE].
Run it and confirm it fails. Then stop and wait for me.
Why it works: "Stop and wait for me" turns the prompt into a duet. The model writes the failing note. I get to decide if that is the right note before we play the next one.
What you get back: A failing test and a polite pause. Common failure: it sometimes writes the implementation anyway because it cannot help itself. Revert and re-prompt.
Prompt 25: Generate edge cases
Here is a function:
[PASTE].
Generate a checklist of edge cases I should test.
Include:
- Empty inputs.
- Boundary values (0, 1, max, max-1, negative).
- Unicode and locale-sensitive cases.
- Concurrent or repeated calls.
- Unusual but plausible real-world inputs.
Do NOT write the tests yet. I want the list first.
Why it works: The list before the tests is the part. Reviewing a checklist of edge cases is faster than reviewing twenty written tests, and you will catch the missing categories.
What you get back: A bulleted edge case list. Common failure: sometimes invents implausible cases ("what if the input is a Date object?" when it cannot be). Cull.
Prompt 26: Fixture builder
I keep writing the same setup code in [TEST FILE].
Build me a fixture helper that:
- Creates a default [ENTITY] with sensible defaults.
- Accepts overrides for any field.
- Returns a typed object.
Show me 3 example usages from existing tests rewritten to use the helper.
Why it works: Forcing three rewritten usages proves the helper is actually ergonomic. If the rewrites look uglier than the originals, the helper is wrong.
What you get back: A small builder and three before/after examples. Common failure: builders that grow into mini-ORMs. Keep them dumb.
Prompt 27: Snapshot test setup
Set up snapshot testing for [COMPONENT OR OUTPUT].
Constraints:
- Snapshots live next to the test files, not in a global folder.
- Snapshot what the user sees, not internal state.
- Diff format must be human-readable in PR review.
Write one snapshot test for [SPECIFIC CASE], and explain when I should NOT use snapshot testing.
Why it works: The "when I should NOT use snapshot testing" question is the safety brief. Snapshots quietly become noise if you snapshot everything. The prompt forces the conversation up front.
What you get back: One snapshot test plus a short anti-pattern guide. Common failure: snapshot files that are too large to review meaningfully. Trim the snapshotted surface.
Prompt 28: Integration test scaffold
Scaffold an integration test for [FEATURE] that:
- Spins up [REAL DEPENDENCY: db, queue, http server] using [DOCKER, TESTCONTAINERS, etc.].
- Seeds minimal data.
- Tests the happy path end-to-end through the real boundaries.
- Tears down cleanly even on failure.
First, list the dependencies and the setup steps. Wait for my approval before writing code.
Why it works: Integration tests are expensive. Listing the dependencies before writing them is how you avoid spinning up a Postgres container to test a function that does not even touch the database.
What you get back: A dependency list, then a runnable test. Common failure: leaks containers when teardown fails. Add a global teardown hook.
Prompt 29: Mock the right boundary
I need to test [FUNCTION] without hitting [EXTERNAL DEPENDENCY: Stripe, Twilio, our own auth service].
Where exactly should the mock live?
Options:
1. At the SDK boundary (mock the library).
2. At our wrapper boundary (mock our internal client).
3. At the network boundary (mock fetch/http).
Recommend one with a justification. Then implement it.
Why it works: This is the prompt that saved me the first time. I had been mocking the SDK for months, which meant I was testing the wrong layer. The "recommend one with a justification" forced a conversation that exposed the mistake.
What you get back: A reasoned recommendation and an implementation. Common failure: mocks that drift from real behavior over time. Add a contract test against the real dependency in CI.
Prompt 30: Coverage gap analysis
Run [COVERAGE TOOL] against [MODULE] and report.
Then:
- List the top 5 uncovered branches by likely impact, not by line count.
- For each, propose one test that would close the gap.
- Flag any uncovered code that you suspect is dead and should be deleted instead of tested.
Do not write any tests yet. Show me the analysis.
Why it works: "Likely impact, not line count" reframes coverage from a metric to a judgment call. And the "delete instead of test" option is the one most people forget exists.
What you get back: A short, prioritized analysis. Common failure: misjudges impact when the code is in a rarely-used path. Add usage data if you have it.
Documentation (Prompts 31-36)
I used to think docs were the chore you did at the end. Then I watched a junior engineer onboard onto a project with no docs and watched her spend three weeks doing what should have taken three days. Now I treat docs as part of shipping, and I have prompts for the kinds of docs I write most often.
The hardest thing about doc prompts is that the model has a strong default voice (cheerful, evangelical, slightly hollow) that you have to prompt away from. Specificity beats enthusiasm.
Prompt 31: README starter
Write a README.md for this project.
Project: [ONE SENTENCE].
Audience: [WHO IS READING THIS].
Sections, in order:
1. What it does (3 sentences max).
2. Why you would use it (3 bullets).
3. Quick start (copy-pasteable, runnable).
4. Configuration (env vars, with defaults).
5. Common tasks (run, build, test, deploy).
6. Where to ask for help.
No marketing language. No emojis. No "Welcome!" anywhere.
Why it works: The bans are the prompt. Without them, the model writes a brochure.
What you get back: A scannable README. Common failure: sneaks in a "Features" section anyway. Delete it.
Prompt 32: API docs from code
Generate API documentation for [MODULE OR ROUTE FILE].
Source of truth: the code itself. Do not invent fields.
For each endpoint or function:
- One-sentence purpose.
- Inputs with types.
- Outputs with types.
- Errors with codes.
- One realistic example request and response.
Format: [MARKDOWN | OPENAPI | MDX].
Flag anything in the code that is unclear and needs a developer comment.
Why it works: "Do not invent fields" is the single most important instruction. Models will hallucinate API surfaces if you let them.
What you get back: Accurate docs and a list of "I cannot tell" flags. Common failure: example values that look real but are not (e.g., a UUID format that does not match yours). Sanity check.
Prompt 33: ADR drafting
Draft an Architecture Decision Record for the following:
Decision: [ONE SENTENCE].
Context: [WHY ARE WE DECIDING THIS NOW].
Options I considered: [LIST WITH ONE-LINE PROS AND CONS].
Decision: [WHICH ONE].
Consequences: [WHAT BREAKS, WHAT GETS EASIER, WHAT IS NOW HARDER].
Status: [PROPOSED | ACCEPTED | SUPERSEDED].
Use the Michael Nygard ADR template. Keep it under one page.
Why it works: Naming the template (Michael Nygard) anchors the model to a known shape instead of inventing one.
What you get back: A short, scannable ADR. Common failure: vague consequences. Push for "what breaks" specifics.
Prompt 34: Changelog from commits
Generate a changelog for the range [GOOD TAG]..[BAD TAG].
Group by: [Added | Changed | Fixed | Deprecated | Removed | Security].
Source: git log.
Rules:
- One line per user-visible change. Internal refactors collapse into "Internal."
- Plain English. No commit hashes in the changelog itself.
- Flag any breaking changes at the top.
Show me the draft. I will edit before publishing.
Why it works: The "user-visible change" filter is the whole game. A good changelog is a story for users, not a transcript for engineers.
What you get back: A clean changelog draft. Common failure: misclassifies a fix as a change or vice versa. Eyeball the categories.
Prompt 35: Onboarding doc
Write a 1-day onboarding doc for a new engineer joining [PROJECT].
Goal: by end of day, they have the project running locally and have made one trivial PR.
Sections:
- Accounts they need.
- Tools to install (with exact commands).
- How to clone, install, and run.
- The first issue they should pick up (link or describe).
- Who to ping when stuck (use placeholders).
No history lessons. They do not need to know about the great refactor of 2023 yet.
Why it works: The history-ban keeps the doc focused on Day 1 outcomes. History goes in a separate doc.
What you get back: A practical Day 1 plan. Common failure: assumes tools are installed that are not. Have a junior engineer try it.
Prompt 36: Postmortem template
Generate a blameless postmortem template for [INCIDENT TYPE].
Sections:
- Summary (one paragraph, plain English, no jargon).
- Timeline (UTC timestamps, what happened and what we did).
- Impact (users affected, duration, blast radius).
- Root cause (the technical why, not "human error").
- What went well.
- What did not go well.
- Action items (each with an owner and a date).
Tone: factual, kind, focused on systemic causes.
Why it works: The "no human error" line is the cultural prompt. Postmortems that blame people teach the team to hide; postmortems that blame systems teach the team to learn.
What you get back: A reusable template. Common failure: vague action items. Push for owners and dates.
Deployment (Prompts 37-43)
Deployment is where vibe coding meets reality. The model can scaffold beautifully and refactor elegantly, but the moment you push the wrong env var to prod, none of that matters. Deploy prompts are the most paranoid in my collection. They should be.
I lean heavily on the Claude Code documentation for the model side, and on GitHub's Actions docs for the CI side. Both are honestly pretty good as primary sources. The prompts below sit on top of those.
Prompt 37: Dockerfile starter
Write a Dockerfile for [APP TYPE: Node, Python, Go, etc.] that:
- Uses a [SPECIFIC BASE IMAGE] (no `latest` tags).
- Multi-stage build: builder stage, runtime stage.
- Runtime stage runs as a non-root user.
- Copies only what is needed at runtime (no dev deps).
- Sets a healthcheck.
Show me the Dockerfile and a one-line `docker build` command.
Explain any line that is not obvious.
Why it works: The "no latest tags" rule prevents one of the most common production fires. Pinned tags are non-negotiable.
What you get back: A reasonable Dockerfile and a build command. Common failure: copies too much in the builder stage. Use a .dockerignore.
Prompt 38: GitHub Actions CI
Write a GitHub Actions workflow for this repo.
Triggers: push to main, pull request to main.
Jobs:
- lint
- typecheck
- test (unit + integration)
- build
Constraints:
- Use [specific action versions, no @latest].
- Cache dependencies.
- Fail fast.
- Upload coverage as an artifact.
Show me the YAML and explain any non-obvious step.
Why it works: "No @latest" again. The lesson generalizes: pin everything you depend on, including actions.
What you get back: A working YAML. Common failure: caches the wrong directory for your package manager. Verify the cache hit rate after one run.
Prompt 39: Env var audit
Audit the env vars used in this codebase.
Output a single table with columns:
- Variable name
- Where it is read (file:line)
- Required vs optional
- Default value (if any)
- Description (best guess from context)
- Risk level if leaked (low/medium/high/critical)
Flag any that are read but not documented in [.env.example | docs].
Why it works: Env vars are one of the most leaked secrets categories, and most teams cannot even list theirs from memory. The audit prompt makes the unknown known.
What you get back: A table you can paste into a doc or a spreadsheet. Common failure: misses dynamic env reads (e.g., process.env[name]). Search for those patterns specifically.
Prompt 40: Infra-as-code stub
Generate an [TERRAFORM | PULUMI | CDK] stub for [INFRASTRUCTURE: a Postgres instance, an S3 bucket, a Lambda].
Provider: [CLOUD].
Constraints:
- No secrets in the file.
- Use a remote state backend.
- Include tags for environment, owner, and cost center.
- Smallest viable size and cheapest viable tier.
Walk me through the plan output before I apply.
Why it works: The "smallest viable size and cheapest viable tier" line is the budget guardrail. Defaults in IaC are often production-grade and production-priced.
What you get back: A small, taggable resource definition. Common failure: forgets backups for stateful resources. Always ask "what happens if this disappears?"
Prompt 41: Blue/green plan
I want to deploy [SERVICE] using a blue/green strategy.
Walk me through the plan:
- What "blue" and "green" mean in our environment.
- How traffic is shifted (load balancer? DNS? service mesh?).
- How we verify the green deployment before cutover.
- How we roll back if green is bad.
- What state (database, caches, queues) needs to be compatible across both.
Show me the steps as a runbook, not a script.
Why it works: Asking for a runbook, not a script, forces the model to think in terms of people deploying, which is what actually happens at 3 AM.
What you get back: A numbered runbook with verification steps. Common failure: skips the database compatibility step. Make sure migrations are forward and backward compatible.
Prompt 42: Rollback playbook
Write a rollback playbook for [SERVICE].
Trigger conditions: [WHAT MAKES US ROLL BACK, e.g., error rate > 2% for 5 min].
Steps:
1. Detect (who notices, how).
2. Decide (who calls it, in what channel).
3. Execute (exact commands or UI clicks).
4. Verify (what tells us we are recovered).
5. Communicate (status page, internal channels, customer comms).
Tone: calm, numbered, copy-pasteable. Assume the reader is panicked.
Why it works: "Assume the reader is panicked" is the prompt. Calm, numbered, simple. No prose. No "considerations." Just steps.
What you get back: A short, executable playbook. Common failure: assumes the dashboard URL is memorable. Include the actual link.
Prompt 43: Monitoring + alerts
Propose a monitoring and alerting setup for [SERVICE].
Coverage:
- Golden signals: latency, traffic, errors, saturation.
- Key business metrics: [LIST 1-3].
- Specific failure modes I have seen: [LIST].
For each metric:
- What I should track.
- What threshold should page me vs. just log.
- Where the alert should route (oncall, channel, ticket).
Suggest the smallest possible alert set that covers the most likely outages.
Why it works: "Smallest possible alert set" fights alert fatigue. A noisy oncall rotation is worse than no oncall rotation.
What you get back: A trimmed, prioritized alert set. Common failure: pages on warnings that should just log. Be ruthless.
Code review (Prompts 44-50)
Code review prompts are the closest thing I have to a second pair of eyes when nobody is around. They are also the prompts that have most changed how I write code in the first place. If I know the security-review prompt is going to find injection bugs, I write fewer of them up front.
The trick to a good code-review prompt is focus. A prompt that asks for "all the issues" gets you a wall of generic suggestions. A prompt that asks for one specific kind of issue gets you something useful. If you want the full discipline behind these patterns, our 15 rules for vibe coding best practices covers the broader workflow that makes prompts like these land consistently.
Prompt 44: Security review
Review this diff for security issues only.
Diff: [PASTE OR REFERENCE].
Focus areas:
- Injection (SQL, command, prompt).
- AuthZ/AuthN bypasses.
- Secrets in code or logs.
- Unsafe deserialization.
- SSRF / open redirects.
- Rate limiting.
- Input validation at boundaries.
Output:
- Severity (critical/high/medium/low).
- Exact line.
- Why it is a problem.
- Smallest fix.
Be ruthless. False positives are fine. Missed positives are not.
Why it works: "False positives are fine. Missed positives are not." flips the model's bias from polite to paranoid, which is exactly what security review needs.
What you get back: A prioritized list with line-level pointers. Common failure: misses logic bugs that enable security issues (e.g., a missing tenant check). Pair with the API design review.
Prompt 45: Performance review
Review this code for performance only.
Focus on:
- N+1 queries.
- Unnecessary loops over large collections.
- Missing indexes implied by query patterns.
- Synchronous I/O on the hot path.
- Memory allocations inside loops.
Ignore style and naming. Output:
- Estimated impact (High/Medium/Low).
- Exact line.
- Suggested change.
- A measurement I can run to confirm.
Why it works: "A measurement I can run to confirm" is the only way to keep performance reviews honest. Suggestions without measurements are guesses.
What you get back: A short list of suspect lines with measurement hooks. Common failure: misses I/O hidden behind helper functions. Show the model the helpers.
Prompt 46: Find the bug
There is a bug somewhere in this code:
[PASTE].
Context: [WHAT IT IS SUPPOSED TO DO].
Symptom: [WHAT IT ACTUALLY DOES].
Do not refactor. Do not add features. Do not improve style.
Find the bug. Cite the line. Explain in one paragraph.
Then propose the smallest possible fix.
Why it works: The negation list is the prompt. Models love to volunteer extra work. This one tells it not to.
What you get back: One line, one explanation, one fix. Common failure: sometimes finds a bug instead of the bug. Push back if the fix does not match your symptom.
Prompt 47: API design review
Review the API design for [ROUTE OR MODULE].
Schema: [PASTE OR REFERENCE].
Focus on:
- Naming consistency (with the rest of our API).
- Idempotency (especially for POST/PUT).
- Versioning strategy.
- Pagination defaults.
- Error envelope consistency.
- Backward compatibility risk.
Compare against our existing endpoints in [REFERENCE FILE].
Output one paragraph per issue. Rank by severity.
Why it works: "Compare against our existing endpoints" is the prompt that turns abstract API design feedback into something specific to your codebase.
What you get back: Ranked, comparative feedback. Common failure: sometimes invents conventions you do not have. Verify against the reference file.
Prompt 48: Naming review
Review the names in this code:
[PASTE].
Focus only on naming.
For each name (variable, function, type, file), ask:
- Does it match our naming convention?
- Does it describe behavior or just type?
- Would a new engineer understand it without context?
Output:
- Old name → suggested name → one-sentence reason.
Do not change the code yet. Just propose. Maximum 10 suggestions, ordered by impact.
Why it works: "Maximum 10 suggestions, ordered by impact" forces ranking, which prevents the wall-of-renames that is unreviewable.
What you get back: A short, prioritized rename list. Common failure: suggests renames that conflict with framework conventions (e.g., renaming getServerSideProps). Sanity check.
Prompt 49: Accessibility review
Review this UI code for accessibility only.
Focus areas:
- Semantic HTML (correct elements for the role).
- Keyboard navigation (tab order, focus traps, escape behavior).
- Screen reader support (ARIA roles, labels, live regions).
- Color contrast (note where I should manually verify).
- Form errors (associated with inputs, announced to screen readers).
For each issue, cite the line, describe the user impact, and propose the fix.
Cite WCAG 2.2 AA criteria where relevant.
Why it works: "Cite WCAG 2.2 AA criteria where relevant" anchors the model to a real standard instead of vibes.
What you get back: A list of issues with WCAG references. Common failure: misses dynamic states (modal focus traps, route changes). Test with a keyboard.
Prompt 50: Test quality review
Review the tests in this file for quality, not coverage:
[PASTE].
For each test ask:
- Does it test behavior or implementation?
- Is the failure message useful?
- Would this test catch a real regression, or only a refactor?
- Is the setup hiding important context?
Rank the tests from "high signal" to "noise" and tell me which ones I should delete or rewrite.
Why it works: This is the second prompt that saved me. I had a test suite of 800 tests and a CI that took 22 minutes. After running this prompt across the suite, I deleted roughly 280 tests, the suite ran in 9 minutes, and we caught more bugs in the next month, not fewer. High-signal beats high-count every time.
What you get back: A ranked list with delete/rewrite recommendations. Common failure: occasionally suggests deleting a test that is the only thing guarding a critical edge case. Use judgment.
What I do with all of this
These fifty prompts live in a folder called prompts/, one Markdown file per prompt, named for what they do. I have shell aliases that pipe them into Claude Code with the variables filled in. The whole thing is maybe 200 lines of glue code. I am happy to share it if folks want, but honestly the glue is less important than the prompts themselves.
The thing I keep coming back to: vibe coding prompts are like jigs in a woodshop. (Sofia would say they are like bowing patterns. Both are right.) You build them once, you use them a thousand times, and they make the difference between work that looks neat and work that is neat. The prompts in this post are jigs. Take them, modify them, throw out the ones that do not fit your workflow. The point is to have the jig, not to have my jig.
A few honest caveats. None of these prompts will rescue a bad CLAUDE.md. None of them will replace the slow work of actually understanding the codebase you are operating in. And every single one of them works better when you have built up the persistent context that lets the model speak your codebase's dialect. Prompts are the verbs. Context is the grammar. Without grammar, the verbs do not mean much.
If you are just getting started, our complete beginner's tutorial for Claude Code is the right next step, and the Claude Code tutorial for beginners covers the setup and first-run experience in more detail. If you have a specific gap you wish was filled here, I would love to hear it. The prompts I trust most are the ones I stole from someone else's prompts.md, so the favor is mutual.
Which slot do you wish was filled? The one I am still working on is "plan a 12-month dependency upgrade for a long-running monolith without losing my mind." If you have nailed that one, I am all ears.