It is 11:47 pm on a Tuesday and I am staring at a Vercel deploy URL for what is, by my count, the thirtieth product I have shipped this way. Not the thirtieth feature. The thirtieth standalone, real-users-on-it, Stripe-keys-in-production project. So when people ask me what is vibe coding, I do not reach for the dictionary. I reach for that deploy URL.
This one is a small invoicing tool for freelance translators. Two screens, a database, a payment flow, an email reminder. The build took me four evenings. I have not typed a single curly brace.
That feels worth saying out loud, because the question I get asked most, by friends, by clients, by my mother in law, is some variant of what is vibe coding, actually? Not the textbook version. The lived version. What does it feel like, what does it cost, what does it break, what does it not break, when does it make me look like a magician and when does it make me look like a fool.
This is my answer. Or rather, our answer, because halfway through I am going to hand the keyboard to my friend Kai, who is grumpier about all of this than I am and whose skepticism I trust more than my own enthusiasm.
The honest practitioner's definition
The textbook version goes like this. Vibe coding is a way of building software where you describe what you want in natural language, an AI agent writes and modifies the code, and you steer the result through conversation rather than typing the code yourself.
That is fine as far as it goes. It is also a little like defining cooking as the application of heat to food. Technically correct, completely useless if you actually want to know what is going on in the kitchen.
The phrase was coined by Andrej Karpathy in a post on X in February 2025. He described a mode where he would fully give in to the vibes, embrace exponentials, and forget that the code even exists. Important historical context: he meant it about throwaway weekend projects. He was not prescribing a way to build production payments infrastructure. The phrase escaped the lab. By spring 2026 it has come to mean something much wider, and Karpathy himself has moved on to talking about agentic engineering as the next layer up.
So let me give you the version I would give a friend over coffee.
Vibe coding, in 2026, is a relationship between a human and one or more coding agents, where the human owns the intent and the judgment, and the agent owns the typing. How tight that relationship is, how much the human reads, how much the agent decides on its own, that varies wildly. It is not one thing. It is a spectrum.
That spectrum is the actual thing worth understanding, and it is the thing the corporate definitions out there miss completely.
The spectrum (or, the four ways to vibe code)
Think about cooking again. There is a continuum from a microwave dinner you peel the film off, to following a recipe with your kid, to riffing on a recipe you know cold, to running the line at a restaurant where you taste, adjust, plate, and never look at a written instruction. The food at every step is real food. People eat it. Some of it is delicious. Some of it gives you food poisoning. The skill required, the failure modes, and the type of human who succeeds at each are completely different.
Vibe coding has the same shape. Here are the four points on the spectrum I see most often, in the order I recommend graduating through them.
1. Accept all (raw vibes mode)
You type what you want, the agent generates a sprawl of files, you click Accept All without reading a line, and you run the thing. If it works, ship it. If it does not, paste the error back and ask the agent to fix it.
This is the mode most people first encounter, usually inside a browser-based builder. It feels like a magic trick the first time. It is also where the horror stories come from: deployed apps with hardcoded API keys, login forms that store passwords in plaintext, a database table called users that anyone on the internet can read.
Use case: a personal weekend project, a throwaway internal tool, a demo that will live for a week.
Risk: catastrophic if you are storing anything that matters. You do not know what you do not know.
Typical practitioner: a curious non-developer who has never opened a terminal, or a senior engineer prototyping at 1 am who has decided sleep matters more than rigor tonight.
2. Read every line (cautious mode)
The agent generates, you read every diff before accepting. You ask questions. You google the unfamiliar function. You sometimes reject and ask for a different approach. The agent moves slower because you are slower.
This is where most working software developers spend their first three months with a tool like Cursor. The tool is fast, but their internal trust dial is at zero, so they police every change. It feels less productive than vibe coding is supposed to feel. That is fine. It is the apprenticeship.
Use case: any codebase you intend to keep, especially when you are still learning the agent's failure modes.
Risk: low, but the productivity gains are also modest. You are basically using AI as a smart pair who you do not yet trust.
Typical practitioner: a working developer one to three months into adopting the tools, or anyone working in a regulated environment for the first time.
3. Direct and verify (the pair programming sweet spot)
You give the agent a clear chunk of work, big enough to matter but small enough to verify. Maybe a single feature. You read the diff at a chunk level, not a line level. You spot-check the unfamiliar parts. You run the tests. You poke the UI. You either ship or send the agent back with specific feedback.
This is where most of the productivity narrative happens. The work I shipped last night was mostly in this mode. I would say build me a Stripe checkout flow that works for both one-off invoices and saved customers, store the customer ID in the freelancer table, send a webhook to mark paid, and then read the four files it touched in maybe ninety seconds. If something looked weird, I asked. Most of the time, the diff was reasonable.
Use case: real product work, the bread and butter of professional vibe coding.
Risk: medium. You will miss things. Your tests had better be good, and you had better be in the loop on architecture.
Typical practitioner: a developer with three to twelve months of agent experience, or a non-developer who has earned scar tissue and now has reasonable instincts.
4. Agentic engineering (the conductor)
You stop typing prompts at all and start writing plans. You break a feature into a written spec, hand the spec to a planner agent, the planner produces a task list, the task list goes to one or more worker agents, and you review the result at the PR level rather than the diff level. You are the architect and the editor. You almost never touch the code yourself.
This is the mode Karpathy himself signaled when he started using the term agentic engineering and quietly stopped vibe coding in the original sense. It is what serious teams are converging on for non-trivial work. It demands that you write better specs than you ever had to write before, because the agent will faithfully build whatever you ask for, including the wrong thing if you asked for the wrong thing.
Use case: serious, sustained product work; long-lived codebases; multi-agent setups.
Risk: low if your spec discipline is good. Catastrophic if you treat plans as throwaway prompts.
Typical practitioner: a senior engineer or technical founder who has done their reps at level three and is ready to scale themselves across multiple parallel work streams.
The mistake almost everyone makes is to assume that vibe coding means level one and stops there. It does not. The whole point of getting good at this is moving up the spectrum without losing the speed that made you start in the first place.
This whole framing, by the way, is why the everyone can code, nobody can ship gap is real. Level one gets you a lot of code. Level four gets you something you can actually run for a year.
Okay. I have done the easy part: defining the thing and being enthusiastic about it. Time to bring in someone who is going to push back. Kai, you have the floor.
Kai here: when vibe coding actually wins
Hello. I am the friend Omar warned you about.
I have been a working engineer for fifteen years, mostly on backend systems where the cost of a bad night is measured in pages going off at 4 am and postmortems with the legal team. My natural posture toward any new methodology is suspicion. I have watched many silver bullets land flat. So when Omar started telling me, two years ago, that he was building real things this way, my instinct was to roll my eyes.
I do not roll my eyes anymore. I want to tell you, with the precision Omar tends to skip, where this approach actually wins.
It wins when the surface area of failure is small and the cost of being wrong is recoverable. That is the whole rule, in one sentence. Everything else flows from it.
A landing page that mis-renders on Safari for a day. Recoverable. Vibe code it.
An internal admin tool used by four people who all sit in the same chat channel and will yell at you in real time when something breaks. Recoverable. Vibe code it.
A side project that generates leads for a freelancer, where the worst case is two missed inquiries and an embarrassing apology email. Recoverable. Vibe code it.
A scraping script that pulls public data into a Google Sheet once a week. Recoverable. Vibe code it.
A B2B SaaS prototype going to fifty design partners who explicitly know they are using a beta and have your cell number. Recoverable. Vibe code it.
In all of these cases, the speed gain is the value, and the cost of a bug is bounded. I have watched Omar ship things in a long weekend that my old team would have quoted at three months. I have watched a non-engineer friend launch a paid product, on Stripe, with real customers, in two weeks, when the previous attempt had taken him a year of waiting on a contractor. The numbers, when you sit with them, are absurd. A typical agency build for the kind of MVP we are talking about runs $25k to $80k and three to six months. The same MVP, vibe coded by a competent person, costs maybe $200 in tool subscriptions and four to ten evenings. That is two orders of magnitude on price and one on time.
There is also a quieter win that Omar undersells. Vibe coding makes me, as a senior engineer, willing to start things I would otherwise have shelved. The activation energy for a side experiment used to be a Saturday of yak shaving. Now the activation energy is typing the first sentence. I have started seven small experiments this quarter that I would not have started a year ago. Three of them turned into something useful. That is a hit rate I never had before, because I never used to take that many at-bats.
Aviation has a concept called the flight envelope. It is the range of speeds, altitudes, and load factors inside which the aircraft is designed to operate safely. Inside the envelope, you trust the airframe. Outside the envelope, you find out how good your ejection seat is. Vibe coding has a flight envelope. I am happy to fly inside it. Now let me tell you what is outside it.
When vibe coding does not work
I will give you my list, in roughly descending order of seriousness of the mistake.
Long-lived enterprise codebases with deep cross-cutting constraints. A million-line Java monolith with twenty years of business rules encoded into it. The agent simply cannot hold the relevant context. Every change risks violating an invariant that exists only in someone's head and a deprecated wiki page. The senior engineers there are not slow because they cannot type fast. They are slow because they are the institutional memory of the system. Replacing them with an agent does not save time, it just defers the cost to a future incident.
Anything where the failure mode is catastrophic and recovery is expensive. Medical device firmware. Aircraft control systems. The matching engine of an exchange. The clearing logic at a bank. Cryptographic primitives. These domains require the kind of formal reasoning, adversarial review, and decades of accumulated paranoia that no current agent can fake. If your bug can kill someone or wipe out a fortune, you are not in the flight envelope. Get out.
Compliance-heavy domains where every line is auditable. HIPAA, PCI, SOC2 in its strict reading, financial reporting code that has to satisfy auditors. The issue is not that the agent cannot write the code. It often can. The issue is that you have to be able to defend every line in a deposition, and I have never met someone who can do that for code they did not actively author at level three or above. If a regulator asks why a particular branch exists and your honest answer is the agent suggested it and I accepted, you have a problem.
Performance-critical hot paths. The inner loop of a video codec. A trading strategy where microseconds matter. A database storage engine. These require the kind of obsessive measurement, profiling, and architectural taste that emerges from years of staring at flame graphs. Agents can write correct code in these areas. They very rarely write fast code. The two are not the same.
Greenfield architecture decisions on systems you intend to run for a decade. This is the subtle one. Agents are very good at choosing the most popular off-the-shelf option for any given problem. They are bad at saying actually, given the specifics of your business, you should not use Postgres for this, you should use something weirder. The default choice is usually fine. Sometimes it is not, and the cost of the wrong default does not show up for two years.
Anything where you do not yet know what you want. This is counterintuitive. People assume agents are perfect for the I am not sure phase. In my experience, the opposite is true. The agent will faithfully build whatever you ask for, which means if you do not know what you want, you will end up with a confident implementation of the wrong thing, and the plausibility of that wrong thing will fool you for longer than a blank page would have. A blank page is a good honest interlocutor. A working app that does the wrong thing is a liar.
If you are in any of those zones, I am not telling you to never touch an agent. I am telling you to operate at level three or four, with humans in the loop who can defend the code, and to be honest with yourself about which zone you are in.
Okay, back to Omar. He wants to talk about tools.
The 2026 toolset, briefly
Thanks Kai. The tooling has stratified more than people realize. Rather than re-litigate the full vibe coding tools comparison, let me give you the taxonomy I use when someone asks which one.
Terminal-native agents. Claude Code, Aider, OpenCode. You stay in your terminal, the agent reads your repo, it makes changes, you review diffs. This is where the serious shipping happens. It is where I built the invoicing tool I mentioned at the top. If you are comparing the two heavyweights, I wrote a long Cursor versus Claude Code piece you can read.
IDE-native agents. Cursor, Windsurf, GitHub Copilot's agent mode. You work inside an editor that has been retrofitted, or built from scratch, to host the agent. The seam between writing and reviewing is much tighter. Great for level two and level three work. Less good for orchestrating multiple parallel agents.
App builders. Bolt, Lovable, v0, Replit's agent. You stay in a browser, you describe what you want, you get a deployed app. Best on-ramp for non-developers. Best place to do level one, if level one is what you actually need. The frontier of what they can build expanded a lot in the last year, but the failure modes Kai listed are very real here too.
Specialty tools. Codex CLI for OpenAI-flavored agentic work, the various enterprise IDE plugins, the experimental multi-agent harnesses people are building on top of all of the above.
The honest meta-advice: your first tool does not matter very much. The skill transfers. What matters is making the leap from level one to level three before the project you care about gets too big to recover. If you want a set of guardrails for that journey, our vibe coding best practices distill the hard lessons into 15 rules worth following.
If you want a current snapshot of what is working at the frontier across all of these, I keep a running list in 50 Claude Code tips, most of which generalize across tools.
Vibe coding versus the other words people throw around
People conflate four things. Let me separate them, briefly.
Prompt engineering is the craft of writing a single input to a model that gets a useful output. It is a skill that lives inside vibe coding. A vibe coder is, among other things, a prompt engineer, but vibe coding is a much wider activity that includes review, iteration, deployment, and judgment.
Agentic engineering is what level four on my spectrum becomes when you take it seriously. It is the discipline of writing specs and plans for agents to execute, with humans operating as architects and editors. Karpathy effectively retired vibe coding as his preferred term in favor of this one, for the reasons Kai outlined. They are not opposites; agentic engineering is vibe coding grown up.
No-code is a different animal entirely. Bubble, Webflow, Glide, Airtable. Visual builders that produce platform-locked applications, not source code. You are renting capability from the platform. With vibe coding, you own real files in a real repo, and you can hire any human developer in the world to take it from here. The two often look similar from the outside. They are very different from the inside.
Low-code is the cousin of no-code. Same spirit, slightly more escape hatches. Same fundamental constraint: you do not own the engine.
If you walk away with one disambiguation, let it be this: vibe coding produces a real codebase. Everything else either lives one level above the code (no-code, low-code) or one level inside it (prompt engineering, agentic engineering as a sub-discipline of vibe coding).
What we are still figuring out
Kai and I disagree about this list. We disagree about most things. But here is the union of our open questions, the things that thirty projects in I genuinely do not know the answer to.
Where does institutional knowledge live in a vibe-coded codebase? When the original author shipped it in a weekend and never had to defend any decision, and then leaves the company eighteen months later, who can pick it up? In a hand-written codebase the comments, the commit messages, the structure, all carry implicit memory. In a vibe-coded one, they often do not. We are fumbling toward conventions for this. Nobody has solved it.
What is the right level of test coverage? Tests are cheap to generate now. Are they meaningful when the same agent writes the test and the implementation? My honest answer: I do not fully trust agent-written tests for the same reason a student should not grade their own exam. But I do not yet have a clean alternative that scales.
How much should we worry about agents that quietly disagree with each other? When you run two agents on the same codebase, they sometimes converge on inconsistent assumptions. The bugs that result are weirder than human-author bugs. We have rough intuitions about when this happens, but no good tooling to detect it.
Where is the ceiling? Independent evaluation work, including METR's research on agent capabilities, suggests the gap between what an agent can do in a tightly scoped benchmark and what an agent can do in an open-ended real codebase over weeks is still very large. That gap is closing. How fast it closes will determine whether the spectrum I described is the steady state, or just the snapshot at one moment in a much faster evolution. I genuinely do not know.
I find myself at the end of this essay less certain about what vibe coding is than I was when I started writing it, which feels honest. The thing keeps changing under us. Each model release shifts the boundary of which level is appropriate for which task. Each new harness (the Claude Code release notes from Anthropic are a useful pulse) changes what level four even means.
So, where to from here
If you have read this far, you probably want to actually try this. Good. Try it. Pick something small enough that Kai would call it recoverable, and big enough that you will care if it works. If you want a guided first project you can finish in an afternoon, our vibe coding for beginners walkthrough builds a real app from scratch.
If you are starting from zero and have never opened a terminal, go read the complete beginner's tutorial for Claude Code and follow it end to end. For a shorter, more focused onboarding, the Claude Code tutorial for beginners is a solid alternative. If you already have a tool you like and want to get better, pull up 50 Claude Code tips and steal a workflow or two. If you are still tool shopping, the tools comparison lays out the trade-offs.
And if you ship something this week, real or scrappy, working or broken, I would love to hear what level of the spectrum you ended up at, and what you got wrong on the way there. The thirtieth project taught me different things than the third did, and I expect the sixtieth will teach me different things than the thirtieth. We are all early to this. The conversation continues.