Last Tuesday I spent forty minutes fighting a terminal, and it is the reason I installed claude cowork the next morning.
The task was stupid. Seventy-three supplier invoices sitting in my Gmail. Pull out line items. Match against a Google Sheet. Flag anything over our quarterly threshold. A human intern could do it in an afternoon. I tried to do it with Claude Code instead.
It went badly. I wrote a script to hit the Gmail API. Got throttled. Wrote OAuth glue. Got confused. Pulled the PDFs. Tried to parse them with a vision model. Sent myself in circles. By the time I had a working pipeline, I had spent more engineering time than the task was worth, and I still needed to paste the output into the sheet by hand.
Not everything lives in a git repo.
That is the thing nobody says out loud. For two years I have been evangelizing terminal-native AI to anyone who will listen. I run Claude Code for five hours a day. I love it. But there is a whole category of work, real work, income-generating work, that does not happen in a text editor. It happens in a browser tab, or a spreadsheet, or an inbox, or that weird desktop app your finance team swears by. And for that work, the terminal is the wrong shape.
Which is why, when Anthropic shipped claude cowork on Monday, I cleared my afternoon.
Marcus and I have been running it for four days. This is what we found.
What Anthropic actually shipped on January 12
Claude Cowork is Anthropic's new desktop agent, announced Monday morning. It is not a model. It is not a chat window. It is a program that runs on your Mac or Windows machine, sees your screen, moves your mouse, types into your apps, and reads the documents on your filesystem. Under the hood it uses the Computer Use API that Anthropic first previewed in late 2024, now wrapped in a proper product with integrations, permissions, and a memory.
The launch ships with three first-party integrations: Gmail, Google Drive, and Chrome. The Gmail connector lets Cowork read, draft, label, and send mail. The Drive connector lets it open, edit, and create documents. The Chrome connector gives it a live browser it can drive, including authenticated sessions on sites that do not expose APIs.
Here is a detail from the launch post that is easy to miss: Anthropic says almost all of Cowork's core code was written autonomously by Claude over roughly a week and a half. A small team of humans reviewed and steered. The rest was agent output. Whatever you think about that claim, it tells you how seriously Anthropic is eating their own cooking.
The product tier is straightforward. Pro users get ten hours of agent time per month. Team and Enterprise users get more, plus shared integrations and admin controls. It installs in about four minutes. First run felt like opening a new laptop for someone else and letting them drive.
This is a landmark in a way I did not expect. Claude Code was a terminal companion. Cowork is a coworker.
Claude Code vs Claude Cowork: the mental model
Marcus here. I want to make this distinction clean, because I watched three engineers on my team get it wrong this week.
The metaphor that works for me is this: Claude Code is a phone call. Claude Cowork is the coworker showing up at your desk.
On a phone call, everything has to be described. You cannot point. You cannot hand someone a folder. You have to turn intent into words and text into actions. That is the terminal. It is precise, it is fast, it scales beautifully, and it is the right tool when the work is already text.
When someone shows up at your desk, the conversation changes. You point at the screen. You say "this thing here, do that." You hand them a stack of paper. You let them log into the tools you already log into. The bandwidth is higher and the overhead is lower, but you can only have so many of those conversations at once, and the coworker cannot be in two offices.
Translated back to software:
- Claude Code wins when the task lives inside a repo, a codebase, a CLI, or a filesystem that rewards grep and git. Anything where the truth is text and the output is text. Refactors, migrations, test generation, shell automation, infrastructure as code, data pipelines with well-defined inputs.
- Claude Cowork wins when the task lives inside other people's apps. Inboxes, CRMs, SaaS dashboards, browser-based portals, document workflows, anything with a real UI that was not designed for programmatic access. Also: tasks that span multiple apps in a single flow, where writing glue code would cost more than the task is worth.
- Both overlap when the work touches code and apps. Shipping a PR that also updates a Notion doc and a Linear ticket. Writing a blog post that lands in a CMS and a newsletter tool. In that seam, I reach for Cowork more often than I expected to.
I was wrong about this overlap a week ago. I thought Cowork would be a Gmail toy. It is not. It is the part of my job I could not automate before.
One more framing Kai and I argued about: Claude Code is a contractor you brief and let run. Cowork is closer to an employee you onboard. Contractors are faster on bounded tasks. Employees are better at ambient ones. Both metaphors are useful. Neither is complete.
If you are already running Claude Code on your phone via dispatch, think of Cowork as the third leg of that stool. Terminal on the laptop. Dispatch on the phone. Cowork on the desktop, for the work that is not text.
Installing it and making your first connection
Setup took me about six minutes. Nothing exotic.
- Download the installer from
claude.ai/coworkand run it. Mac users get a notarized.dmg, Windows users get an.msi. Both require admin rights on first install because Cowork needs accessibility permissions. - Sign in with your Claude account. If your org is on Team or Enterprise, admin approval gates this step.
- Grant screen recording, accessibility, and keychain access. On macOS this is four toggles in System Settings. Do not skip any or the agent runs half-blind.
- Connect Gmail, Drive, and Chrome through Anthropic's OAuth flow. Each scope is listed explicitly. Drive defaults to read-only until you flip write on.
- Run the onboarding task. Mine was "read my last ten emails and draft a one-line summary of each." It took ninety seconds and I watched it happen live.
There is a control panel I want to flag. Cowork has a guardrails tab where you set allowed apps, blocked apps, rate limits, and human-approval triggers. I set it to require approval on any send-email, any file-delete, and any Chrome navigation to a domain I have not approved. This is not optional for me. Agents with mouse access are qualitatively different from agents with text access.
A note on privacy. Anthropic's documentation states that screen recordings are processed ephemerally and not used for training by default. I read the policy twice. It is clearer than most. But if you are in a regulated industry, check with your compliance team before you connect a real inbox.
Workflow one: inbox triage with Gmail
My first real test was that invoice problem from the opening.
I wrote a four-sentence brief. "Open my inbox. Find every email from a supplier domain with an attachment that looks like an invoice. For each one, extract vendor name, invoice number, total, and due date. Write the results to a Google Sheet called Q1 Invoice Tracker. Flag anything over $5,000 in red."
Cowork ran for twenty-two minutes. I watched about eight of those, then left for coffee. When I came back, seventy-one of seventy-three invoices were in the sheet. The two it skipped were attachments in a format it flagged as "possibly non-invoice." It had asked me about them in a pending approval. I tapped yes and it finished.
Napkin math: writing that pipeline in code would have taken me three to four hours, plus Gmail API scope approval, plus OAuth, plus PDF parsing, plus error handling. Cowork did it in twenty-two minutes, unsupervised, in apps I already owned. At a rough billable rate of $80 per hour, that is $240 to $320 of work done for the cost of a fraction of my monthly Pro seat.
The catch: I would not trust it with outbound email yet. Drafting is fine. Sending is something I gate manually. One wrong reply-all and your week is ruined.
Workflow two: research and synthesis in Google Drive
Marcus again. My test was different. I had sixteen PDFs from vendor RFPs sitting in a shared Drive folder. Partners wanted a one-page comparison across nine dimensions by Friday.
I pointed Cowork at the folder and gave it the nine dimensions as a list. It opened each PDF, read it, took notes in a scratch doc, and after about thirty-five minutes produced a draft comparison matrix in a new Google Doc. The draft was not perfect. It misattributed a pricing detail from one vendor to another. It collapsed two similar SLA clauses into one. But the draft saved me the first four hours of skimming, and editing a draft is a different kind of tired than starting from zero.
What impressed me was the memory. When I asked "which of these vendors mentioned SOC 2 Type II without me prompting?" it went back to its notes, not the PDFs, and answered in seconds. This is the part that feels less like automation and more like a second brain.
What bothered me was the confidence. When it misattributed that pricing detail, it did so with the same tone it used for correct facts. If you are going to let Cowork do research synthesis, you need a review step. I have landed on a rule: Cowork drafts, humans verify citations. Same rule my firm applies to junior associates. The metaphor holds.
One workflow nuance worth naming. Drive integration works best when your folder structure is legible. Cowork is not a search engine. If your files are named Final_v2_USE_THIS_one.docx, the agent will struggle to decide which is canonical. Clean file names are a prerequisite, not an afterthought.
Workflow three: Chrome, forms, and the long tail of repetitive tasks
This is the one that sold me.
We have an internal compliance portal. Logging in requires SSO. Each month our ops lead fills out roughly forty short forms in that portal, one per active client, copying data from a spreadsheet. It is thirty to forty-five minutes of pure friction. No API. No export. Just a browser and a human and a drop-down menu.
I gave Cowork the spreadsheet, the portal URL, a recording of me filling out two forms as reference, and a brief. It asked clarifying questions. I answered them. It asked one more. Then it ran.
Forty forms in forty-three minutes. It paused twice for CAPTCHAs and I solved them. It flagged three rows where the spreadsheet data looked ambiguous and asked me to confirm. The total active time from me was about six minutes of watching and clicking.
The Chrome integration is the unlock here. Web automation has existed forever. Selenium, Playwright, a hundred browser extensions. All of them require scripting. Cowork is the first time I have been able to describe a browser task in English, point at a running browser, and get the result. This is what "Computer Use" was always gesturing at. It is finally good enough for bounded, repetitive web work.
A qualifier: I would still reach for Playwright for anything I need to run nightly and reliably. Cowork is a coworker, not a cron job. For one-off or weekly batch work, it is magical. For production-grade automation, you still want code.
Where Cowork still struggles
We owe you the honest map of where this breaks.
Latency. A mouse moving across a screen is slower than a function call. Cowork feels like a human assistant, which means it works at human speed. Twenty invoices in twenty minutes, not twenty thousand. If throughput matters, write the code.
Visual edge cases. Modal dialogs that shift layout, infinite scroll, dynamically loaded content, and anything involving drag-and-drop are still shaky. It recovers better than the late-2024 previews, but it will get stuck on UIs that violate convention.
Cost at scale. Ten hours of agent time sounds like a lot until you hand it to a team. One of our engineers burned six hours in a single afternoon debugging a workflow. Budget it like you would budget a cloud compute account.
Credentials. Cowork uses your logged-in sessions. That is convenient and scary. It means any account the agent can reach, it can act inside. MFA prompts are handled by asking you. 2FA codes on a second device work fine. But the blast radius of a misconfigured prompt is larger than anything Claude Code can do.
Ambiguity. Vague briefs produce vague work. Claude Code tolerates vague prompts because the compiler or the tests will catch problems. Cowork will confidently do the wrong thing across six apps before you notice. Write your briefs like you are writing a Jira ticket for a new hire, not a Slack message to a friend.
Offline. It needs a connection. Obvious but worth saying.
None of these are dealbreakers for me. All of them are things you need to internalize before you hand it a credit card.
Closing
Four days in, I am still working out where Cowork fits.
What I know: the terminal is not the whole job. Some of my most expensive hours every month are spent in apps that have no API, no CLI, no way in except a mouse and a tired human. Those hours are the ones I most want back. Cowork gives me the first credible tool for getting them back. It is imperfect, it is slower than code, it is newer than I am comfortable with, and it is still the best thing I have tried in that category.
What I do not know: whether this changes how teams are shaped. Whether "desktop agent headcount" becomes a line item. Whether the contractor-versus-employee metaphor holds up at scale, or collapses into something neither of us has a word for yet.
If you have been living inside Claude Code and wondering what comes next, come try the other half of the day. If you are already running Cowork, tell us what broke, what surprised you, what workflow you would never give back. We are still writing the map.
The terminal is not going anywhere. It just finally has a neighbor.