Vibe Coding vs Traditional: Same App, Both Ways

10 min read

Apr 27, 2026

I cleared two weekends, opened two empty folders on my laptop, and made a deal with myself: build the exact same app twice. Once the old way, every keystroke mine. Once with Claude Code as the agent, steering with prompts and reviews instead of typing every line. Same scope. Same database. Same deploy target. No cheating in either direction.

I shipped 30+ projects last year. I am not a beginner, and I am not a hype guy. I just got tired of arguments about vibe coding vs traditional coding that were powered entirely by vibes and zero by stopwatches. So I ran the experiment, logged the hours, paid the bills, and watched both versions in production for two weeks. This post is the receipt.

I am going to spoil one thing up front. The result is not a clean win for either side. It is much weirder than that.

The setup: an invoice generator I would actually use

I needed something modest enough to finish in a weekend each, and real enough that the numbers would mean something. So I built an invoice generator. The kind freelancers use to bill clients.

The spec was tight on purpose:

Email and password auth, nothing fancy
Create an invoice with line items, tax, and a client
Render the invoice as a PDF and let the user download it
A small admin panel to mark invoices paid or unpaid
Postgres on Neon. Hosting on Vercel. Next.js for both versions
A single test user account, seeded by a script

I wrote the requirements in a Google Doc the night before. I refused to let myself add features mid-build. No moving the goalposts. In construction, you do not change the floor plan after pouring the foundation. Same idea here.

The two paths were:

Run 1 (Traditional): No AI in my editor. No autocomplete beyond what Next.js gives you out of the box. No Claude. No Copilot. I could read docs, search Stack Overflow, look at my own old projects.
Run 2 (Vibe coding): Claude Code as the agent in a fresh repo. I steered with prompts and code review. I could read what it wrote, push back, and refactor. I avoided typing implementation code unless the agent was completely stuck.

Both runs got a fresh git repo, a fresh Vercel project, a fresh Neon database, and the same starting hour on a Saturday morning.

Run 1: doing it the way I learned

I started traditional first because I wanted my brain warm before the comparison, not tired.

Saturday 9:14 AM. Empty folder, npx create-next-app, the familiar little dopamine hit of a fresh project. I have done this maybe 200 times. Muscle memory took me through routing, Tailwind setup, and a basic auth layout in about 90 minutes. No friction. Pure flow.

Then the wheels touched the road.

Auth took longer than I expected. I picked Auth.js (the new NextAuth) and spent almost two hours fighting session callbacks. I knew the shape of the answer but not the exact API surface for the new version. I read the docs. I checked GitHub issues. I cursed at my screen. By 1:00 PM I had a working login.

The invoice form went fast. Forms are forms. I have built dozens. The PDF rendering was the next pothole. I went with @react-pdf/renderer, hit a layout bug where the table broke across pages, and lost about 45 minutes to that. I shipped the first working invoice download at 4:30 PM.

The admin panel was Sunday morning work. About three hours, including a stupid bug where I forgot a where clause and the "mark paid" button was marking every invoice paid for the user. Caught it in manual testing before deploy. That one would have been embarrassing in production.

I deployed Sunday at 1:45 PM. Total elapsed coding time, by my Toggl log: 11 hours and 12 minutes across the two days.

What surprised me about Run 1 was how much I enjoyed it. I had forgotten the specific satisfaction of typing a function and watching it work. There is a craftsman feeling, like sanding a piece of wood by hand instead of running it through a planer. Slower. More tactile. Quieter in your head.

Run 2: handing the pencil to the agent

The next weekend I started Run 2. Same Saturday morning slot, same coffee, same playlist. Fresh repo, fresh Neon database, fresh Vercel project. I opened Claude Code in the terminal and pasted in the spec from my Google Doc. If you have never used the tool, our Claude Code tutorial for beginners covers the setup and first run.

The first hour felt like cheating.

I typed something close to: "Set up a Next.js 16 app with Auth.js email/password, a Postgres schema for users, clients, invoices, and line items, and scaffold the routes for invoice CRUD." Claude Code asked clarifying questions about the auth provider I wanted, then went to work. I watched files appear. I watched migrations get written. I read every diff before approving it.

By 10:30 AM I had auth working, the database seeded, and a basic invoice form rendering. That is roughly four and a half hours of Run 1 work, done in about 75 minutes of mostly reading.

Then it went sideways.

The PDF rendering. Same library. The agent picked @react-pdf/renderer because that is what the docs and examples on the web pointed to. It hit the same table-breaks-across-pages bug I hit. The difference: I knew how to fix it from a week ago. Claude Code tried three different things, none of them right, before I stopped it and just told it the answer. Twenty minutes wasted that I would have spent in five if I were typing.

The admin panel was the next surprise. The agent built it cleanly and, importantly, did not introduce the missing-where-clause bug I had introduced manually the week before. It wrote a small test that caught the case before I even ran the app. I would not have written that test on my own at this stage. I would have written it after the bug bit me in production.

I deployed Run 2 on Saturday at 6:02 PM. Total elapsed time: 6 hours and 38 minutes, all on one day.

The numbers, prose-style

Here is the napkin math, with the receipts.

Time. Run 1 took 11 hours and 12 minutes across two days. Run 2 took 6 hours and 38 minutes in one. That is roughly a 41% reduction in wall-clock time for the vibe-coded run. Not a 10x miracle. Not nothing.

Cost. Run 1 cost me coffee and electricity, call it a few dollars. Run 2 cost me about $14.20 in Claude API spend across the session, plus the same coffee and electricity. The Vercel and Neon bills were identical, both on free tiers for this scale. So the agent run cost roughly $14 more in compute. If my time is worth even $30 an hour, the four and a half hours saved bought back that $14 about ten times over. If my time is worth what I bill clients, it bought back the cost in the first 20 minutes.

Lines of code. Run 1 ended at 1,847 lines (excluding lockfiles). Run 2 ended at 2,134 lines. The agent wrote about 15% more code, mostly in the form of small utility functions and tests it added without me asking. More code is not always better, but in this case the extras held up under review.

Bugs in production. Both versions ran for two weeks with me as the only user, plus a friend I asked to try it. Run 1 hit two production bugs: a date-formatting issue in the PDF and a session that did not refresh after password change. Run 2 hit one: an edge case in the tax calculation when a line item had zero quantity. Small N, but the totals are what they are.

Tools. Run 1 used Next.js, Auth.js, Postgres, @react-pdf/renderer, Tailwind, and my own brain. Run 2 used the same stack plus Claude Code, plus a small handful of test utilities the agent pulled in.

So, on the scoreboard: vibe coding shipped faster, cost a few dollars more, wrote slightly more code, and had one fewer production bug. That is the headline. But headlines are lies, and the interesting stuff is in the footnotes.

What surprised me, on both sides

I went into this expecting the vibe-coded run to be faster and sloppier. It was faster, but it was not sloppier. That genuinely surprised me.

What surprised me on the traditional side was how good it felt. I had been doing AI-assisted coding for about a year before this experiment, and going back to typing every line was like switching from an electric drill back to a hand screwdriver for an afternoon. The hand version is slower and your wrist hurts a little, but you notice the wood. You notice the threads. You notice yourself. I came out of Run 1 understanding the codebase in a way I did not come out of Run 2.

What surprised me on the vibe-coded side was the test discipline. I do not write enough tests. I know I do not. The agent wrote them by default, and the one bug it caught in development would have been a real production incident. That is not nothing.

What also surprised me was how much time I spent reviewing and debugging the vibe coded output rather than writing from scratch. It is a different kind of labor, closer to editing than authoring. And I was also surprised by how often I had to stop the agent in Run 2. Not because it was wrong, but because it was about to over-engineer. It wanted to add a queue for PDF generation. It wanted to build a second email-template system. I had to keep saying no, smaller, that is out of scope. If you are not paying attention, vibe coding can spiral into a much bigger app than you asked for, and you are the only one with the spec in your head.

The METR study, and why I am not going to extrapolate

I want to flag something honest before I make any recommendations.

There is a real study from METR that found AI coding tools sometimes make experienced open-source developers slower, even when those developers expected the opposite. The setup, the population, and the conclusions are worth reading directly rather than through my paraphrase. We covered the implications in a separate piece on the METR productivity paradox if you want the longer take.

I bring this up because my experiment came out the other direction, and I do not want to pretend a one-developer, one-app weekend is a refutation of a real study. It is not. Different developers, different codebases, different agents, different days. The honest answer is that the productivity story for AI coding is more textured than any single Twitter screenshot will tell you. Read the METR research, read my data, read your own logs, and be suspicious of anyone, including me, who tells you they have the final word.

For the agent itself, the Claude Code docs are the source of truth on what the tool can and cannot do this month. I would also point you at Anthropic's announcements for the model updates that often change what works.

What this comparison cannot tell you

This is where I have to put on the brakes. This experiment is N=1. One developer. One app. One stack. One weekend per side. Two weeks of production data with two users.

Things I did not test:

A larger codebase where the agent has to hold more context
A team setting where multiple people are committing
A truly unfamiliar stack where I would have struggled in Run 1
A long maintenance horizon where the cost of bad architecture compounds
Stress under deadline pressure, where I cut corners differently in each mode

If you scale this up, things change. A teammate of mine ran a similar experiment on a 50,000-line app he has owned for three years. His ratios were almost reversed. The agent was constantly proposing changes that broke conventions only he and the codebase knew about. He spent more time correcting it than he saved. That is real. Bigger and older codebases are a different sport.

So treat this post as a single data point with timestamps, not a paper. If you want a paper, go read METR and pair it with whatever you log on your own machine.

When you would pick which

Here is the framework I have started using, refined since this experiment.

Pick traditional coding when:

The task is small, finite, and you already know the exact shape of the answer
You are learning a new pattern and want it in your fingers
The codebase has heavy, undocumented conventions that an agent would trample
You are debugging a specific bug where reading every line slowly is the work
The cost of a wrong commit is unusually high (auth, payments, anything regulated, or anything covered by your vibe coding security checklist)

Pick vibe coding when:

You are bootstrapping something new and want to be in production by Sunday
The stack is mainstream enough that the agent has seen it 10,000 times
You are willing to do real code review on every diff (this is the load-bearing assumption)
Tests are part of the deliverable and you want them written by default
You are exploring a design and want three versions of the same screen by lunch

We wrote a longer guide on vibe coding best practices if you want the rules in one place. The short version: stay in the loop, read every diff, keep scope tight, and treat the agent like a strong intern, not a senior engineer.

If you are completely new to this and wondering what the term even means, our what is vibe coding explainer is the gentlest entry point. If you want the tactical playbook for the agent itself, 50 Claude Code tips is the densest thing we have written. And if you are picking tools and not sure which agent to use, the 2026 vibe coding tools comparison lays out the field.

The honest ending

I did this experiment because I was tired of arguing without data. I now have a tiny pile of data, and I am still uncertain about the big questions. That feels right.

The two runs were both real. The traditional version still feels more like mine in a way I cannot fully justify. The vibe-coded version is the one I am actually using to bill clients, because it is the one with the test for the marking-paid bug. Both feelings are true at the same time.

The thing I am most sure of, after two weekends and two production deployments, is that the question is not vibe coding vs traditional coding in the abstract. The question is which tool, for which task, on which day, with which stakes. That is not the answer that gets retweeted. It is the answer that survives contact with a real codebase.

Run your own experiment. Log the hours. Tell me where my numbers fell apart and where they held up. I would rather argue with your stopwatch than my memory.

Share on X LinkedIn