Last Tuesday I opened a ticket that had been rotting in our backlog for four months: migrate the payment service from Stripe Checkout to Stripe Payment Elements. Three services touch it. Two frontends. A webhook consumer that nobody fully understands. A test harness held together by hope.
I tried local /plan first. I always do. Thirty seconds in, I could feel the plan getting thin at the edges, the way a map gets thin when you zoom past the surveyed area. Then I remembered claude code ultraplan had landed the week before in v2.1.91. I closed the local plan mid-sentence and reran the prompt as /ultraplan instead.
Something genuinely different happened.
Here is the metaphor I keep coming back to. Local /plan is a napkin sketch. You pull it out at the bar, you draw the loading dock, you circle where the stairs go. It is fast, it is cheap, and it is correct at the resolution it operates at. Napkin sketches have saved more buildings than most architects want to admit.
Claude code ultraplan is the drafting table.
It is the full sheet, the T-square, the French curve, the cross-section, the detail callout for the one weird corner where the plumbing meets the electrical. You do not pull out the drafting table to decide where the coffee maker goes. You pull it out when you are committing to a load-bearing decision and you want to see every stress vector before you swing a hammer.
This post is about when to pull out the drafting table. What claude code ultraplan actually does differently. How the sidebar works, how inline comments change the loop, and when I still reach for the napkin instead.
What /ultraplan actually is
/ultraplan is a claude code planning command that offloads the entire planning session to Anthropic's cloud. Your local terminal becomes a thin client. A remote session spins up running Opus 4.6 with up to 30 minutes of continuous reasoning time, and you get a browser tab with a structured outline, inline comments, and emoji reactions on individual plan steps.
That is the whole shape of it. The details are where it gets interesting.
You need claude code v2.1.91 or later. You need to be signed in with a plan that includes cloud compute credits, which on my team means the Max tier. You need one free browser tab, because the cloud session opens there while your terminal keeps your local context alive.
The model matters. Opus 4.6 is not Sonnet. Sonnet is the model you want when you are pair-programming, iterating, generating code in tight loops. Opus is the model you want when the next decision will cost two weeks to reverse. The pricing reflects that. The latency reflects that. The depth reflects that.
Anthropic's own documentation on extended thinking makes the tradeoff explicit: more thinking tokens, better decomposition of hard problems, longer wall-clock time. /ultraplan cranks that budget up to the ceiling and gives the model a dedicated environment to use it. The claude code release notes call out 2.1.91 as the version that ships the cloud planning surface, so if your claude --version prints anything earlier, update before you try this.
The other thing that changed my mental model: the plan is a document, not a transcript. Local /plan dumps a wall of text into your terminal and hopes you read it. Cloud /ultraplan produces an outline you can navigate, comment on, react to, and edit. The artifact is first-class. That alone justifies the upgrade for anything you would have copied into Notion anyway.
Starting a session: the command and the sidebar
The invocation is dead simple. From any claude code session:
/ultraplan "Migrate our payment service from Stripe Checkout to Stripe Payment Elements. Preserve webhook contract. Ship behind a feature flag."Terminal prints a session ID, a cloud URL, and a one-line status. Open the URL. Pour coffee. The session has already started reasoning by the time your browser finishes loading.
What you see when the tab renders: a three-pane layout that looks more like Linear than a terminal.
Left pane is the outline. This is the sidebar everyone asks about. It is a live tree view of the plan as it forms. Nodes appear as the model drafts them, nest when sub-steps get added, and highlight when they are currently being revised. Click any node to jump the main pane to that section. Collapse a branch to focus. It feels like watching someone build a mind map in real time, except you can interrupt.
Middle pane is the plan itself. Rich text, code blocks, diagrams rendered from mermaid, tables when the model decides a table is right. My migration plan came back with a sequence diagram of the Stripe webhook flow that I would have paid a contractor to draw.
Right pane is activity. Comments, reactions, a compact log of what the model is currently investigating. If it is reading your repo, you see which files. If it is checking a dependency version, you see the query. Transparency is the whole point.
The first time I opened an ultraplan session I sat and watched for four minutes without touching anything. The outline grew from three top-level nodes to eleven. Sub-nodes filled in. A callout appeared flagging that our webhook consumer was running on a stale version of the Stripe SDK. I had not told it that. It had gone and looked.
A real planning walkthrough
Here is roughly what the outline looked like eight minutes into my migration session. I am simplifying, but not by much:
Migrate payment service: Checkout -> Payment Elements
├── 1. Current state audit
│ ├── 1.1 Services touching Stripe (found: 3)
│ ├── 1.2 Webhook contract (events: 7, handlers: 4)
│ ├── 1.3 Stripe SDK versions (mismatch detected)
│ └── 1.4 Test coverage baseline (22% on payment paths)
├── 2. Contract preservation strategy
│ ├── 2.1 Event types to maintain
│ ├── 2.2 Backwards-compatible webhook adapter
│ └── 2.3 Rollback criteria
├── 3. Frontend migration
│ ├── 3.1 Payment Elements mount point
│ ├── 3.2 Client secret flow
│ └── 3.3 3D Secure handling (SCA compliance)
├── 4. Backend migration
│ ├── 4.1 PaymentIntent creation endpoint
│ ├── 4.2 Idempotency keys
│ └── 4.3 Webhook signature verification
├── 5. Feature flag rollout
│ ├── 5.1 Flag definition
│ ├── 5.2 Canary cohort (1% -> 10% -> 50% -> 100%)
│ └── 5.3 Rollback playbook
├── 6. Observability
│ ├── 6.1 Metrics to add
│ ├── 6.2 Alert thresholds
│ └── 6.3 Dashboard updates
└── 7. Risks and unknowns
├── 7.1 Regional payment methods (Ideal, Bancontact)
├── 7.2 Apple Pay domain verification drift
└── 7.3 Subscription edge cases (not in scope?)
Node 7.3 is the one that saved me a week. I had not thought about subscriptions at all. Our subscription flow uses a different Stripe surface, but it shares a handful of utilities with checkout, and the model caught a function that both paths import. It flagged it as "probably out of scope, but verify before flag flip." That is the kind of catch that napkin planning does not make, because the napkin does not have room for the word "probably."
The plan kept growing. I went and made a sandwich. When I came back, node 4.2 had a sub-plan with three bullet points of idempotency key strategy, cross-referenced against Stripe's official idempotency documentation, which the model had fetched and summarized.
This is the part that feels like a drafting table. You walk away. You come back. The drawing has more detail than when you left.
Inline comments and emoji reactions
This is the feature I did not expect to care about and now cannot plan without.
Every node in the outline has two affordances: a comment thread and an emoji reaction row. You hover, you click, you type. The model sees your comment. It responds by revising that specific node, not the whole plan.
My first comment was on node 2.2, the backwards-compatible webhook adapter. I wrote: "We actually don't need full compat. Downstream consumers are all internal and can deploy in lockstep. Simplify." Fifteen seconds later the node rewrote itself. The adapter section shrank from four paragraphs to one. The dependency graph in node 5 updated to remove two blockers. The plan got better because I pushed back on it with local context it did not have.
Emoji reactions are the faster version of this. I can slap a question mark on a node I do not trust. The model treats a question mark as "explain this further or defend it." A green check means "locked, do not revise." A red x means "drop this, do not reference it again in subsequent steps." A fire emoji means "this is a critical path, add more rigor." The vocabulary is small and the leverage is huge.
The loop is what changes. In local /plan you read the plan, copy what you like, paste the rest into a prompt, and iterate. It is serial. In ultraplan the loop is annotation-driven. You move through the outline, drop reactions, leave a comment where a reaction is not enough, and the plan reshapes around your feedback in place. It feels less like prompting and more like being on a design review with a colleague who updates the deck in real time.
Two details worth knowing. First, reactions and comments are persistent. You can close the tab and come back in three hours and everything is still there. Second, the model will push back on your comments if it thinks you are wrong. I told it to drop node 7.2 (Apple Pay domain verification) because I was sure we had that automated. It replied in the thread with a link to our CI config showing the automation had been disabled during a migration in January and never re-enabled. It was right. I was wrong. Node 7.2 stayed.
Teleport vs cloud execute
Once the plan stabilizes you have a decision. Execute in the cloud, or teleport back to your local terminal.
Cloud execute means the remote session picks up where planning stopped and starts implementing. It has full access to the plan as context. It can run for the remainder of your 30-minute cloud budget, it writes to a sandboxed branch, and it pushes a PR when it is done. This is the right choice when the work is largely self-contained: codemod, refactor, dependency bump, test backfill. Things where the model does not need your half-configured local environment to make progress.
Teleport means the plan gets shipped back to your local claude code session as the active context, and you execute there. The command is:
/teleport-plan <plan-id>Your local terminal now has the full outline loaded as working context, with every comment and reaction preserved. You can run it step by step, pause between nodes, check in with /review, and generally treat it as a structured to-do list for your local agent.
My heuristic, roughly: teleport if the work touches your dev environment in ways the cloud cannot replicate. My Stripe migration teleported, because I needed the local stripe listen CLI forwarding webhooks to my dev machine, and I needed to run the test suite against a database snapshot that lives on a VPN the cloud session cannot reach. Cloud executed the documentation and the CI config changes. I ran the rest locally with the plan guiding me.
A short napkin-math reason to prefer cloud execute when you can: a 30-minute Opus cloud run costs me roughly the same as 90 minutes of my local Sonnet usage, and the cloud run finishes in 30 minutes of wall clock while I go do something else. If the task fits, the cloud is cheaper and faster. If it does not fit, forcing it fits nothing.
When local /plan still wins
I want to be honest here, because the temptation with a new toy is to use it for everything.
Local /plan still wins in at least four cases.
Small scope. If the work takes under an hour and touches one file, ultraplan is overkill. You spend more time waiting for the cloud session to spin up than you save on planning depth. The napkin is faster.
Tight iteration. When I am deep in a bug and forming and discarding hypotheses every few minutes, I do not want a 30-minute plan. I want fast local reasoning that stays in the terminal. Plan mode works. Ultraplan does not.
Offline or restricted networks. Ultraplan needs the cloud. If you are on a plane, on a client VPN that blocks Anthropic's domains, or in a compliance environment that forbids code leaving the local machine, you cannot use it. Local /plan works on a laptop in a basement.
Exploration, not commitment. If I am not sure what I want to build yet, I do not need a drafting table. I need a whiteboard. Local plan is closer to a whiteboard. I have watched juniors get intimidated by a cloud session full of sequence diagrams when what they needed was to just try something and see what breaks. The drafting table is for the moment you have committed to building. Before that, you are still figuring out what the building is.
When the team is not ready. This one took me a while to learn. An ultraplan artifact is only as useful as the team's willingness to read it. If your teammates do not open the plan, do not leave reactions, do not comment on the steps, you have just created a long document no one reads. I made this mistake on two different teams before I realized it. The tool only earns its cost when the culture uses it. On those two teams, I went back to shorter local plans and shared pull-request-sized commits until the habit of annotation built up organically. Then I reintroduced ultraplan for the bigger work and the pattern stuck.
Claude code ultraplan is a power tool. Power tools are worth their cost when the job matches their power. I keep a short list of project types that default to ultraplan: migrations, auth changes, schema redesigns, anything that touches billing. Everything else starts with the napkin, and graduates to the drafting table only if the napkin runs out of room.
If you are still building your heuristics here, the 50 Claude Code tips post has more on choosing between modes, and the beginner tutorial covers local /plan fundamentals that make ultraplan make more sense once you upgrade.
Plan more, build more
The thing I keep noticing, session after session, is that claude code ultraplan changes the ratio of my work. I plan more. I build more. I fix less.
Not because the plans are perfect. They are not. The model gets things wrong, comments get missed, reactions sometimes overshoot. But a plan I can annotate, navigate, and hand off as an artifact is a plan that survives contact with reality better than one that lives in my terminal scrollback. The drafting table does not draw the building. It just makes the drawing good enough to build from.
I am still figuring out the edges. When does cloud execute beat teleport by a larger margin than my rule of thumb suggests? What is the right session length for a mid-size refactor, where 30 minutes feels like too much and local feels like too little? Should I be running two ultraplans in parallel for independent subsystems, or does that fragment my context too much? I do not have answers yet. The tool is new, my team is new to it, and the honest posture here is that we are all learning the grain of the wood.
Pull out a drafting table sometime this week. Pick a ticket that has been rotting. Type the command, open the tab, and watch the outline grow. Leave a comment. Slap a question mark on a node. See what comes back.
Then tell me what you found. I am still collecting heuristics, and the next good one is probably yours.