Vibe Coding Examples: 10 Real Apps We Built

13 min read

Apr 25, 2026

It is Friday, late. Alex and I are on the office floor with a printout of every project the team shipped this quarter, sorted into three piles. The wins pile has a glass of seltzer holding it down. The mixed pile rests against a backpack. The failures pile keeps trying to slide off the rug, like physics also disapproves of it. What follows are ten vibe coding examples from a real shop: real timelines, real tools, the wins, the mixed bags, and the parts that didn't work. If you want polished case studies where every story ends in a triumphant Slack screenshot, this isn't that. The seltzer is not on the failures pile for a reason.

A bit of context first. Twenty-three projects shipped this quarter. Three months of work. We have done quarterly reviews before, but this is the first one where every project on the floor was built with an AI coding agent in the loop. The mood swings between pride (look how fast we shipped that thing) and the kind of honesty you only get on a Friday after 7 PM. We picked ten of the most representative projects, from runaway hits to abandoned wreckage, and decided to write them up.

A note on framing before we start. If you are new to the term, our explainer on what vibe coding actually is is a better starting point than this post. This post is for people who already know roughly what it means and want to see how it plays out across a portfolio of real projects.

Why we are showing the misses too

Most vibe coding case studies on the internet read like the "after" photo of a renovation. Walls are straight. Lighting is golden. The contractor is smiling. Nobody talks about the rotted joist behind the drywall, the inspection that failed twice, or the morning the plumber quit.

We are showing the misses for a selfish reason: we learn more from them, and the team that publishes them learns more than the team that publishes only their wins. There is also a less selfish reason. The narrative that vibe coding always works is doing damage. Founders are spinning up complex products in 90-minute "build with AI" sessions and then quietly hiring a senior engineer to rebuild it from scratch six weeks later. That delta, between the demo and the rebuild, is exactly what we want to map.

Five of the ten projects below were wins. Three were mixed. Two were failures. That is roughly the real distribution we see internally. Treat it as our napkin estimate of vibe coding's hit rate on a small, experienced team: about 50 percent clear wins, 30 percent shippable but messy, 20 percent rebuild or abandon. A baseball player would take that average. A surgeon would not.

The 10 projects

1. Internal expense report tool (WIN)

What it is: A small web app that ingests photos of receipts, runs OCR, categorizes the line items, and generates a CSV the finance team can drop into the accounting system. Used by 14 people internally.

Who built it: One developer, evenings only.

Tool used: Claude Code, with a side helping of an OCR API.

Time to ship: Two days, including the back-and-forth with finance about category names.

What went right: This was greenfield, single-user-flow, no legacy schema, and the developer wrote a one-paragraph spec into a CLAUDE.md file before starting. The agent did most of the boilerplate, the developer did the design decisions, and the resulting app has been in production for four months without a single ticket. The brief was the win. When the spec was a real spec, the agent acted like a fast junior who actually read the doc.

What we would do differently: We let the agent invent its own category taxonomy and finance had to relabel half of them on day one. Lesson: bring the domain expert into the loop before the agent starts naming things, not after.

2. Marketing site for a podcast (WIN)

What it is: A five-page marketing site for a friend's podcast. Show notes, episode list, RSS embed, a newsletter form, and a press kit page.

Who built it: Pair, one designer and one developer.

Tool used: Lovable, with a quick polish pass in the code editor afterward.

Time to ship: Four hours, almost exactly. Most of it was choosing a typeface.

What went right: Lovable is shockingly good at this exact category of project. Marketing sites are mostly layout, copy, and a few forms. The visual feedback loop closes the gap that pure-text agents struggle with. The designer drove the prompts, the developer caught two accessibility regressions in the final pass, and the site has been live since.

What we would do differently: We wrote the copy after the layout was done, which meant the H1 had to be exactly seven words to fit the design. Reverse that order. Copy first, then layout. The constraint should run the other way.

3. Customer portal for a SaaS client (MIXED)

What it is: A self-serve portal where the client's customers could manage their subscription, view invoices, and submit support tickets.

Who built it: Pair, both senior, one of them me.

Tool used: Cursor, and we picked it specifically because of the legacy codebase. (We compared notes on this elsewhere in Cursor vs Claude Code in 2026.)

Time to ship: Three weeks. The estimate was one.

What went right: Cursor handled the legacy Rails codebase better than I expected. The agent could navigate the existing models and follow the project's idiosyncratic conventions. The first version of every feature shipped quickly.

What we would do differently: Scope creep. The client kept adding "while you're in there" requests, and because the agent made each one cheap, we said yes. The total of all the cheap things turned out to be expensive. Lesson: an agent that can do anything fast is exactly the wrong defense against a stakeholder who wants everything. The brake pedal got softer as the engine got faster. We have a stricter intake form now.

4. Team standup bot (WIN)

What it is: A Slack bot that posts a standup prompt every morning, collects responses in a thread, summarizes them at 10 AM, and posts the summary to the leadership channel.

Who built it: Solo, one developer.

Tool used: Claude Code with the official Slack MCP server.

Time to ship: One day.

What went right: MCP turned what would have been a weeklong "wire up the Slack API" project into an afternoon. The agent had real, working tools to test with, instead of guessing at API shapes from documentation. The bot has not gone down once.

What we would do differently: We put the LLM call for the summary on the critical path with no caching and no fallback. When the summarization API had an outage in week three, the bot just stopped working silently. Always have a fallback that posts something, even if it's the raw thread. Silence is worse than a dumb output.

5. Habit tracker mobile app (FAILURE)

What it is: Was supposed to be a habit tracker, a mobile app built with vibe coding techniques. Login, daily check-ins, streaks, weekly summary, push notifications.

Who built it: Solo side project, one of our developers.

Tool used: Bolt, then partially migrated.

Time to ship: Never shipped. Abandoned at roughly 60 percent.

What went right: The first 40 percent was thrilling. Auth, the basic data model, the daily check-in screen, all of it came together in a weekend.

What went wrong: State management. The app had three concurrent states to coordinate (local optimistic, synced with the backend, displayed in the UI), and each new feature the agent added rewired the state plumbing in a slightly different way. By the time we hit notifications, the app had four different conventions for handling sync conflicts and none of them composed. The developer spent two weekends untangling state, made it worse twice, and shelved it. The house was framed beautifully and the wiring was a snake nest behind every wall. The lesson is unromantic: vibe coding rewards small surface area. When the surface gets complex, you need an architect, and an agent without an architect just adds rooms.

6. Personal finance dashboard (WIN)

What it is: A self-hosted dashboard that pulls from two bank APIs, categorizes transactions, and shows monthly cash flow on a single page.

Who built it: Solo, weekend project.

Tool used: Aider.

Time to ship: One weekend, roughly 11 hours of actual work.

What went right: Aider's terminal-first, git-commit-per-change workflow forced the developer to think in small steps. The repo has 47 commits from that weekend, each one small enough to revert. When the bank's API returned a weirdly nested JSON shape that broke the categorizer, rolling back to the previous commit took six seconds.

What we would do differently: No tests. Zero. The dashboard works, but the developer is now afraid to touch it because there's no safety net. We are going back to add tests before adding any new feature. Treat the test backfill as the price of admission for the second feature.

Alex here: the projects I learned the most from

Elena is going to keep going through the list, but I want to step in for a minute because the failures and the mixed ones taught me more than the wins did, and I want to put words on the pattern before I forget.

When I was learning to play piano, my teacher had a saying: the wrong notes teach you the scale faster than the right ones do. Right notes feel like luck. Wrong notes have a shape. They are diagnostic. The same thing turned out to be true watching this team ship and not-ship apps with an AI agent in the loop.

The wins all sound similar in retrospect: small surface, clear brief, one user flow, greenfield, a senior in the loop who knew what "done" looked like. There is not much craft in describing them. They worked because the conditions were favorable.

The failures and the mixed projects, though, all failed in interesting ways. The habit tracker failed at the seam between features. The customer portal failed at the seam between the team and the stakeholder. The CMS migration coming up next failed at the seam between the new system and the strange decisions baked into the old one. Every failure was a seam. The agent was good at the inside of each room and bad at the doorways.

That is the lesson I keep coming back to. The thing an AI agent is not yet good at is the joinery. The architecture of a building is not the rooms, it is the way the rooms connect to each other and to the load-bearing walls. Vibe coding right now is room-quality work in search of a building-quality plan. When the plan is in place, the rooms come up fast. When it isn't, you get a beautiful kitchen attached to nothing.

Okay. Back to the list.

7. Internal documentation search (MIXED)

What it is: A search interface across the team's internal docs, runbooks, and meeting notes. Click a result, see a snippet, jump to the source.

Who built it: Pair, both mid-level.

Tool used: Claude Code.

Time to ship: Five days, then another four days nobody planned for.

What went right: The UI shipped on day two. Auth, basic search-by-keyword, results page, all of it. Felt like a steal.

What went wrong: The agent reached for the easy library for the search layer and it could not handle the long, structured documents we throw at it. We spent two days trying to tune it, then gave up and rewrote the search layer by hand using a vector database we know well. The hand-written replacement took a day, performs better, and is something we actually understand. We left a note in the repo that just says search/ is human-written, please do not let the agent regenerate this directory. It is, I think, the most honest comment in our codebase.

8. Wedding RSVP site (WIN)

What it is: A single-page site where guests RSVP, choose a meal, and add dietary notes. The couple wanted a custom design, not a SaaS template.

Who built it: Solo, a developer doing a favor for a friend.

Tool used: v0, then deployed on Vercel.

Time to ship: 90 minutes.

What went right: v0 ate this for breakfast. The constraint set was tiny. The visual was clear in the developer's head before they typed a word. The site went live the same evening, and 113 guests RSVP'd over the next three weeks without a single bug report.

What we would do differently: Honestly, almost nothing. This is the platonic vibe coding example: small, scoped, single-purpose, no integrations beyond a database write. If your project fits on a wedding invitation, vibe coding is going to feel magical. The trick is recognizing when your project doesn't.

9. CMS migration tool for a publisher (FAILURE)

What it is: Was supposed to migrate 22 years of articles from a custom CMS into a modern headless platform. About 480,000 articles, plus images, redirects, and editorial metadata.

Who built it: Pair plus me dropping in for design reviews.

Tool used: A combination, Cursor and Claude Code, plus a few one-off scripts in Aider.

Time to ship: Three weeks of vibe coding, then we threw it out and rebuilt it traditionally over four weeks.

What went right: The first 80 percent of the articles migrated cleanly. The agent wrote the parsers, the field mappers, and the image rehosting pipeline at roughly 5x the speed I'd have written them.

What went wrong: The legacy schema had two decades of editorial decisions encoded as exceptions. Articles from 2007 had a different "author" field than articles from 2014. There were six different conventions for embedded video. Three categories had been merged but their slugs hadn't been updated. The agent could not see these patterns because the patterns lived in the heads of editors who had since retired, not in the data. Each edge case the agent "fixed" introduced two new ones. Eventually we admitted the truth: this was not a vibe coding problem, it was a data archaeology problem, and you cannot vibe code an excavation. We rebuilt with a senior engineer mapping the schema by hand and using the agent only for the parser code itself. Worked great. Lesson learned at the client's expense, though, which is not how anyone wants to learn.

10. Real-time chat app for a workshop (MIXED)

What it is: A bare-bones chat app for a one-day in-person workshop, where 60 attendees needed to message into a shared channel and get reactions from facilitators.

Who built it: Solo, one developer.

Tool used: Cursor.

Time to ship: One week, including a 14-hour debugging session the night before the workshop.

What went right: Cursor wrote a working version on day one. The UI was clean, the message list scrolled, the reactions worked. We thought we were done.

What went wrong: Websockets. The agent reached for a websocket library, wired it up, and the happy path worked beautifully. The unhappy paths (reconnect after sleep, mobile background tab, flaky hotel wifi) all had subtle bugs that did not surface until we tested with real conditions. The night before the workshop, the developer spent 14 hours debugging reconnection logic. Most of those 14 hours were spent unwinding agent-generated code that looked like it handled reconnection but didn't. By 4 AM, he had rewritten the websocket layer from scratch. It would have been faster to write it from scratch on day one. Realtime systems remain a place where vibe coding writes the demo and a human writes the production version.

For the curious: we wrote a longer field manual on this exact failure mode in the debugging vibe coded apps survival guide. Save yourself the 14-hour night.

Patterns we noticed across all 10

We laid the projects out on the floor and stared at them for a while. A few patterns showed up clearly enough that we are now using them as intake filters when a new project lands on our desk.

Greenfield beats brownfield, by a lot. Six of our seven full or partial wins were greenfield. The two failures and one of the three mixed projects involved a legacy system. Agents can read documentation; they cannot read the unwritten rules in a 22-year-old database.

Surface area predicts everything. The wedding RSVP site has a surface of maybe four screens and one database table. The habit tracker had eight screens, three concurrent state systems, and a notification platform. Guess which one shipped. We now ask, before starting: how many seams does this project have? If the answer is more than five, we plan for an architect, not just an agent.

Visual tools win at visual problems. Lovable and v0 produced two of our cleanest wins, and both were category-fits: marketing sites and form-heavy single-pagers. We tried v0 once on a complex dashboard and it was the wrong tool. Match the tool to the shape of the problem. We dug into this in our 2026 vibe coding tools comparison, and the trend held in our actual usage.

MCP changes the math for integration work. The standup bot was a one-day project because the Slack MCP existed. Without it, the same project would have been four or five days of API plumbing. When you scope a vibe coding project, ask first whether the integration you need has a tool the agent can already call. If yes, the project is probably half the size you think.

Good agents and good juniors fail in the same ways. They invent taxonomies the domain expert hates. They reach for the easy library when the problem needs the right one. They do the inside of each room beautifully and forget about the doorways. The fix is the same fix you would apply to a junior: a senior in the loop, a clear brief, a code review at every seam, and a willingness to throw out the work when the data tells you to. If you want concrete techniques for keeping the senior loop tight, our 50 Claude Code tips is the place to start.

The fastest path to a win is to start tiny. Every win in this list started with someone who had the entire project in their head before they typed a prompt. None of the wins were "let's see what the agent comes up with." If you are just starting out, our beginner walkthrough of building your first app is built around exactly this principle.

The official docs matter more than you think. When the standup bot broke, the fix was in the Claude Code docs under a section we hadn't read. When the wedding site needed a custom domain, the answer was in the Vercel docs. The agent does not always read the docs for you. Sometimes you still have to.

What we are doing differently next quarter

We are not going to stop vibe coding. The wins were too good and too cheap, though the real cost of vibe coding across tool subscriptions and API bills is worth tracking honestly. We are going to do three things differently.

First, we are going to triage projects at intake the way an ER triages patients: green for greenfield-small-surface (vibe code it), yellow for brownfield-clear-scope (vibe code with a senior in the loop), red for legacy-data-heavy or realtime-critical (architect first, agent second).

Second, we are going to write a one-paragraph "what done looks like" doc for every project before the first prompt. The wins all had one. The failures all didn't.

Third, we are going to do this quarterly review again in 90 days, with fresh projects, the same honesty, and probably the same seltzer. If you are running a team that ships with AI in the loop, do your own version. Sort the projects on the floor. Ask which seam each failure happened at. The patterns will surprise you.

What we are still figuring out is what counts as "done" for a vibe-coded app that worked but no human can fully explain. The personal finance dashboard works perfectly and nobody on the team would feel comfortable on-call for it. Is that shipped? Is that technical debt with a different shape? Honestly, we don't know yet. We will probably write that essay in a few months, when we have more data and more bruises.

Until then: tell us what your floor looks like. We'll keep showing ours.

Share on X LinkedIn