I lost three hours of agent work because my laptop went to sleep. Not a crash. Not a bug. My MacBook decided it was nap time, the SSH tunnel dropped, and a Claude Code session that had been refactoring a 200-file migration just vanished. Three hours of context, reasoning, and incremental progress, gone. I reopened the terminal, stared at the blinking cursor, and thought: there has to be a better way to run long-lived agent tasks than hoping my hardware stays conscious.
That is the problem claude managed agents solve. Anthropic quietly launched this suite of composable APIs in early April 2026, and it represents a fundamental shift in how we think about AI agent infrastructure. Instead of running agents on your local machine, tethered to your terminal session and your laptop's battery life, you deploy them to cloud-hosted containers that persist independently. Your laptop can sleep. You can close the lid. The agent keeps working.
We have spent the last week building with the managed agents API, breaking it in creative ways, and running the numbers on what it actually costs. This is everything we have learned.
Why Local Agent Sessions Were Always a Hack
If you have used Claude Code for any serious project, you know the pain points. Long-running tasks require you to keep a terminal open for hours. Network interruptions kill sessions. Your local machine becomes the bottleneck for what should be a cloud-native workload.
Think of it like cooking. Running Claude Code locally is like being a single chef in a home kitchen. You can only work on one dish at a time. You have to stay in the room. If you leave to answer the door, your sauce burns. You cannot hand off a task to someone else because the entire context, every ingredient measured, every timer set, lives in your head and your kitchen.
Claude managed agents are the restaurant kitchen. Prep cooks work at their own stations. Each one has their own tools, their own workspace, their own ingredients. The head chef (you) sends orders and checks on progress, but the actual work happens independently. If a prep cook needs to step away, another can pick up from the station because the recipe and the state are documented in the ticket system, not trapped in someone's memory.
That architectural metaphor is not decorative. It maps directly to how managed agents actually work under the hood.
The Four Core Concepts
The managed agents API is built on four primitives. Understanding these is the difference between fighting the API and flowing with it.
1. Agent: The Brain
An Agent is a configuration object. It defines which model to use, what system prompt to provide, and which tools are available. Think of it as a job description. It does not do anything on its own. It describes what kind of work an agent should be capable of performing.
{
"model": "claude-opus-4-6",
"name": "code-reviewer",
"instructions": "You are a senior engineer reviewing pull requests...",
"tools": [
{ "type": "computer" },
{ "type": "text_editor" },
{ "type": "bash" }
]
}You create an Agent once and reuse it across many sessions. This is important for cost management. You are not paying for the Agent definition. You are paying for sessions that run it.
2. Environment: The Container Template
An Environment defines the sandbox where your agent runs. It specifies the base image, installed packages, environment variables, and any files that should be pre-loaded. Anthropic's engineering blog describes the philosophy as "cattle, not pets": containers can fail and be reinitialized from session logs without losing state.
{
"type": "cloud",
"setup_commands": [
"npm install",
"pip install pytest"
],
"env_vars": {
"NODE_ENV": "production"
}
}The environment is a template. Every session gets a fresh container built from this template. If the container crashes, the system spins up a new one and replays the session's event log to restore state. This is the key insight that makes the whole system resilient. The container is disposable. The session log is the source of truth.
3. Session: The Running Instance
A Session is what happens when you combine an Agent with an Environment and say "go." It is a running container with a model instance inside it, processing instructions and executing tool calls. Sessions have unique IDs, and they persist even if you disconnect from them.
This is the magic. When you create a session and send it a task, you get back a session ID. You can close your browser. Turn off your computer. Go for a walk. When you come back, you reconnect to the session ID and pick up where you left off. The agent was never interrupted because it was never running on your machine in the first place.
4. Events: The Communication Layer
Everything that happens inside a session is streamed as Server-Sent Events (SSE). Tool calls, model outputs, errors, status updates. All of it flows through a single event stream that you can subscribe to, process, and store.
The event stream is append-only. This is not just a design choice. It is the mechanism that enables container recovery. If a container dies, the new container reads the event log and reconstructs its state. Nothing is lost because nothing was stored exclusively in-memory. Every meaningful state change was written to the event log first.
Building Your First Managed Agent
Let us walk through a practical example. We will build a code review agent that can be triggered by a webhook on every pull request.
Step 1: Create the Agent
curl -X POST https://api.anthropic.com/v1/agents \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: managed-agents-2026-04-01" \
-H "content-type: application/json" \
-d '{
"name": "pr-reviewer",
"model": "claude-sonnet-4-6",
"instructions": "Review the pull request for bugs, security issues, and style violations. Be specific about line numbers. Suggest fixes, do not just identify problems.",
"tools": [
{ "type": "computer" },
{ "type": "text_editor" },
{ "type": "bash" }
]
}'Note the anthropic-beta: managed-agents-2026-04-01 header. This is still in beta, and you need this header on every request. The API will reject calls without it.
Step 2: Define the Environment
curl -X POST https://api.anthropic.com/v1/environments \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: managed-agents-2026-04-01" \
-H "content-type: application/json" \
-d '{
"name": "node-review-env",
"type": "cloud",
"setup_commands": [
"apt-get update && apt-get install -y git",
"npm install -g eslint prettier"
]
}'Step 3: Create a Session and Send Work
curl -X POST https://api.anthropic.com/v1/sessions \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "anthropic-beta: managed-agents-2026-04-01" \
-H "content-type: application/json" \
-d '{
"agent_id": "agent_abc123",
"environment_id": "env_xyz789",
"messages": [
{
"role": "user",
"content": "Clone the repo at https://github.com/example/app, checkout the branch feature/auth-refactor, and review the diff against main. Focus on security implications of the auth changes."
}
]
}'The response includes a session ID and an event stream URL. You subscribe to the event stream to watch the agent work in real time, or you poll the session endpoint later to get the final results.
Step 4: Listen to Events
const eventSource = new EventSource(
'https://api.anthropic.com/v1/sessions/sess_123/events',
{
headers: {
'x-api-key': process.env.ANTHROPIC_API_KEY,
'anthropic-beta': 'managed-agents-2026-04-01'
}
}
);
eventSource.onmessage = (event) => {
const data = JSON.parse(event.data);
if (data.type === 'tool_use') {
console.log(`Agent is using: ${data.tool}`);
}
if (data.type === 'text') {
console.log(`Agent says: ${data.content}`);
}
if (data.type === 'session_complete') {
console.log('Review finished');
eventSource.close();
}
};That is the complete flow. Four API calls and an event listener. The agent clones the repo, reads the diff, analyzes the changes, and produces a review. All in a cloud container that you never have to provision, monitor, or clean up.
The Architecture That Makes It Work
Anthropic published a detailed engineering blog post about the internals, and it is worth reading in full. But the three-component architecture is what matters most for understanding the system's behavior.
Brain: Claude Plus Harness
The "brain" is Claude (whichever model you choose) wrapped in a harness that manages tool execution, context windows, and error recovery. The harness is what turns a stateless language model into a stateful agent. It maintains the conversation history, manages tool call results, and handles the retry logic when tools fail.
The performance numbers here are significant. Anthropic reports roughly 60% reduction in median time-to-first-token (TTFT) and over 90% reduction at the 95th percentile. That p95 improvement matters more than it looks. In local Claude Code sessions, the occasional 30-second pause while the model thinks is jarring. In a managed agent running autonomously, those pauses compound. Cutting p95 TTFT by 90% means your agents spend dramatically less time waiting and more time working.
Hands: Sandboxes and Tools
Each session gets its own sandboxed environment. The sandbox has a filesystem, network access (configurable), and whatever tools you specified in the Agent definition. Multiple sandboxes can run simultaneously within a single session, which enables parallel tool execution.
The security model here is thoughtful. Credentials are managed through a vault-plus-proxy pattern. If your agent needs to authenticate with GitHub, for example, the OAuth token is stored in a vault and accessed through a proxy. The token never reaches the sandbox itself. This means even if the sandbox is compromised (say, by a malicious package in a cloned repository), the attacker cannot extract your credentials. They can use them through the proxy for the duration of the session, but they cannot steal them.
Session: The Append-Only Event Log
The session log is the connective tissue. Every tool call, every model response, every error is recorded in an append-only log. This log serves three purposes simultaneously.
First, it enables recovery. Container dies? Spin up a new one, replay the log, continue from where you left off.
Second, it provides observability. You can stream the log in real time to watch your agent work, or you can analyze it after the fact to understand what happened and why.
Third, it creates reproducibility. Given the same agent configuration and the same initial message, replaying the log should produce the same sequence of tool calls. (In practice, model nondeterminism means exact reproduction is not guaranteed, but the log gives you a detailed trace for debugging.)
What It Actually Costs: Napkin Math
Pricing is the first question everyone asks, and Anthropic has been relatively transparent here. There are two cost components: compute time and tokens.
Compute: $0.08 per session-hour, billed to the millisecond.
This is the cost of the container running your agent. Crucially, idle time does not count. If your agent is waiting for a model response and not executing any tools, you are not paying for that wait time. You only pay when the container is actively doing work.
Tokens: Standard API rates.
Whatever model you choose, you pay the same per-token rates as you would for direct API calls. Opus 4.6 runs $15 per million input tokens and $75 per million output tokens. Sonnet 4.6 is cheaper at $3/$15.
Web search: $10 per 1,000 searches.
If your agent needs to search the web, each search costs a penny. This adds up if you are building research agents that search aggressively, but for most use cases it is a rounding error.
Let us run the numbers on the code review agent from our example above.
At $0.08 per session-hour, running a code review agent on 50 pull requests per day costs how much? Each review session takes an average of 5 minutes (clone, read diff, analyze, generate review). That is 50 sessions times 5 minutes each, which gives us 250 minutes or about 4.2 session-hours. The compute cost is 4.2 times $0.08, which equals $0.34 per day.
But compute is the cheap part. The tokens are where the real cost lives. A typical PR review with Sonnet 4.6 might consume 30K input tokens (the diff, the codebase context) and 5K output tokens (the review comments). At Sonnet rates, that is $0.09 per input plus $0.075 per output, roughly $0.17 per review. Multiply by 50 reviews and you get $8.25 per day in token costs.
Total: about $8.59 per day, or roughly $258 per month for fully automated code review on 50 daily PRs. If you use Haiku instead of Sonnet for simpler reviews, you could cut the token cost by 80% and land closer to $60 per month.
For a team of five engineers who each spend 30 minutes per day on code review, that is 2.5 hours of engineering time recovered daily. At a loaded engineering cost of $100 per hour, you are saving $250 per day. The agent pays for itself on day one.
A more complex example from Anthropic's documentation: a one-hour coding session with Opus 4.6 consuming 50K input tokens and 15K output tokens costs $0.08 (compute) plus $0.75 (input tokens) minus... wait, let me redo this. Input: 50K times $15 per million equals $0.75. Output: 15K times $75 per million equals $1.125. Plus $0.08 compute. Total: $1.955. The documentation says $0.705, which suggests they are using Sonnet pricing in their example. This is worth noting: the model choice dramatically affects your costs. Sonnet is roughly 5x cheaper per token than Opus.
Rate Limits and Beta Constraints
The beta has meaningful rate limits you need to design around.
- 60 requests per minute for session creation (write operations)
- 600 requests per minute for reads (polling session status, reading events)
The 60 writes per minute limit means you cannot burst-create hundreds of sessions simultaneously. If you are building a CI/CD integration that fires on every commit across a large monorepo, you will need to queue session creation requests.
There are also research preview features that are gated behind additional access: outcomes tracking, multi-agent orchestration, and persistent memory across sessions. These are not available in the general beta. You need to apply for access separately.
One important branding constraint: you cannot use "Claude Code" or "Claude Cowork" branding in products built on managed agents. Your product needs its own identity. This seems like a reasonable boundary, but it is worth knowing before you design your marketing materials.
Real-World Patterns We Have Built
After a week of building, here are the patterns that worked well and the ones that did not.
Pattern 1: The CI/CD Code Reviewer (Works Great)
This is the obvious first use case. On every pull request, a webhook triggers a managed agent session that clones the repo, checks out the branch, reads the diff, and produces a structured review. We post the review as a PR comment via the GitHub API.
What works: the agent has consistent quality. It does not get tired at 4pm. It does not skip reviews because it is busy. It catches the same class of bugs every time, which means your human reviewers can focus on architectural and design concerns rather than spotting missing null checks.
What does not work: very large diffs. If a PR touches 50+ files, the context window fills up and the review quality degrades. We solved this by chunking large PRs into logical groups (tests, migrations, application code) and running separate sessions for each chunk. More sessions means more cost, but the reviews are actually useful.
Pattern 2: The Documentation Generator (Works Great)
Point a managed agent at a codebase and ask it to generate or update documentation. The agent can read source files, understand the API surface, and produce markdown docs that reflect the actual code. We run this weekly as a cron job.
The append-only event log is especially useful here. We can diff the documentation output week-over-week to see what changed in the codebase, even if nobody updated the docs manually. The agent becomes an automated changelog.
Pattern 3: The Test Writer (Works, With Caveats)
Ask the agent to read your source code and write tests. It does a reasonable job generating unit tests, and the sandbox means it can actually run the tests to verify they pass before submitting them.
The caveat: integration tests that depend on external services (databases, APIs) require you to set up those services in the environment. This is doable but adds complexity to your environment configuration. Mock-heavy test suites work better in this pattern than tests that need real infrastructure.
Pattern 4: Multi-Agent Pipelines (Not Yet Ready)
The dream is chaining agents: one agent writes code, a second agent reviews it, a third agent writes tests, and a fourth agent runs the tests and reports results. The multi-agent orchestration feature exists in research preview, but it is not available in the general beta. We tried simulating it by having one session create another session via the API, and it works technically but the ergonomics are poor. Wait for the official multi-agent support.
The Lock-In Question
We should be honest about this. Claude managed agents only run Claude models. You cannot bring your own model. You cannot swap in GPT-5 or Gemini or Llama. If you build your infrastructure on managed agents, you are making a bet on Anthropic.
Is that bet reasonable? We think so, today. Claude is arguably the best coding model available as of April 2026, and the managed agents infrastructure is genuinely well-designed. But "best today" is a dangerous foundation for architectural decisions. The AI model landscape shifts quarterly. What happens if a competitor ships a meaningfully better model for your use case six months from now?
The mitigation is abstraction. Build your agent workflows with a thin wrapper around the managed agents API. Define your agent configurations, environment templates, and event processing logic in a way that is not tightly coupled to Anthropic's specific API shapes. If you need to migrate later, you are rewriting the API integration layer, not your entire agent infrastructure.
We do not know if this concern will age well or look paranoid in hindsight. But we have been burned before by vendor lock-in, and the memory is fresh enough to make us cautious.
There is also the SLA question. As of this writing, Anthropic has not published a formal SLA for managed agents uptime. For hobby projects and internal tools, this is fine. For production systems where agent downtime means business impact, the lack of an SLA is a real gap. Discussion on Hacker News reflects this concern, with several commenters noting that "no SLA" is a dealbreaker for enterprise adoption.
Early adopters like Notion, Rakuten, and Sentry are presumably operating under private agreements with Anthropic. If you are not at that scale, you need to design for the possibility that the service is unavailable when you need it most.
Security: The Vault-Proxy Pattern
The credential isolation model deserves its own section because it is genuinely clever and it addresses one of the biggest concerns with cloud-hosted agents.
When your agent needs to access external services (GitHub, Jira, Slack, your internal APIs), you do not pass credentials directly to the sandbox. Instead, credentials are stored in a vault, and the sandbox accesses them through a proxy. The proxy authenticates the request using the vault credentials and forwards it to the target service. The sandbox sees the response but never sees the credential.
This matters because sandboxes run arbitrary code. Your agent might clone a repository that contains a malicious package. That package might try to read environment variables or files looking for API keys. With the vault-proxy pattern, there is nothing to find. The API keys exist in the vault, outside the sandbox, and are only used by the proxy at the network boundary.
Compare this to the local Claude Code experience, where your agent runs in your terminal with access to your .env files, your SSH keys, your cloud provider credentials, and everything else on your machine. If you have been following our tips for securing Claude Code sessions, you already know the risks. Managed agents eliminate an entire category of those risks by architectural design.
Performance Characteristics
We ran some informal benchmarks comparing managed agents to local Claude Code sessions for identical tasks. The results were interesting but not what we expected.
Cold start: The first session in a new environment takes 15-30 seconds to provision. Subsequent sessions using the same environment template start in 3-5 seconds. Containers are provisioned on-demand, and Anthropic appears to cache frequently-used environment templates.
Inference speed: Noticeably faster than local sessions. The ~60% p50 TTFT improvement that Anthropic claims matches our experience. We measured an average of 1.2 seconds to first token versus 3.1 seconds locally for comparable prompts. The p95 improvement is even more dramatic. We saw local p95 TTFT of 12+ seconds versus under 2 seconds with managed agents. This makes sense because the managed agents infrastructure has a direct, low-latency connection to Anthropic's inference servers. Your local session is routing through the public internet.
Tool execution: File operations and bash commands execute at cloud VM speeds, which is generally faster than a developer laptop for I/O-heavy tasks. Git clones in particular are dramatically faster because the container has datacenter-grade network connectivity.
Cost efficiency: For tasks under 10 minutes, managed agents are more expensive than local sessions (you are paying compute on top of tokens). For tasks over 30 minutes, managed agents start winning because you are not paying for idle time and you are not tying up your local machine. The break-even point depends on your token consumption rate and how you value your laptop's availability.
Migration from Local Claude Code
If you are currently using Claude Code locally and want to move some workflows to managed agents, here is the practical migration path.
Step 1: Identify long-running or repetitive tasks.
Code review, test generation, documentation updates, and migration scripts are all good candidates. Interactive development (where you are pairing with the agent in real time) is still better locally because the latency of the feedback loop matters more than the durability of the session.
Step 2: Extract your CLAUDE.md patterns.
If you have a well-tuned CLAUDE.md file (and if you do not, you should, see our guide to Claude Code tips and workflows), the instructions in it translate directly to the instructions field in your Agent configuration. Copy your system prompt, your coding standards, and your project-specific rules into the Agent definition.
Step 3: Build environment templates for your tech stacks.
Create environment configurations for each of your project types. A Node.js environment with your standard toolchain. A Python environment with your data science packages. A Go environment with your linters and formatters. These templates are reusable across sessions and projects.
Step 4: Start with internal tools, not customer-facing products.
Run managed agents on your internal workflows first. CI/CD integration, automated documentation, nightly test runs. Build confidence in the system's reliability before putting it in the critical path of customer-facing features.
Step 5: Implement fallback logic.
Because there is no SLA, your integration should handle the case where the managed agents API is unavailable. The simplest fallback is to queue the task and retry later. A more sophisticated fallback is to spin up a local Claude Code session as a backup. Design for graceful degradation from the start.
The "Framework Fatigue" Concern
One of the more thoughtful criticisms we have seen comes from a Hacker News commenter who described the current agent ecosystem as the "pre-PHP web of agents." Their point: we have dozens of agent frameworks, each with different abstractions, different APIs, different mental models. Managed agents is just another one. How do we know this is the abstraction that will win?
We do not know. That is the honest answer. But we have a hypothesis: the frameworks that will survive are the ones backed by model providers, because they can optimize the integration between the model and the execution environment in ways that third-party frameworks cannot. Anthropic can tune Claude's behavior specifically for the managed agents harness. They can optimize the token-to-tool-call pipeline. They can build security features that require cooperation between the model layer and the infrastructure layer.
Third-party frameworks like LangChain and CrewAI offer model flexibility, which is a real advantage. But they pay a tax for that flexibility in the form of abstraction layers that add latency and reduce the quality of the model-tool integration. Whether that tradeoff is worth it depends on how important model portability is to your use case.
For most teams we talk to, the answer is: not as important as they think. They are using Claude for everything anyway. The theoretical value of model portability is high. The practical value, given current usage patterns, is near zero.
What Is Coming Next
The research preview features hint at where managed agents are headed.
Outcomes: The ability to define success criteria for a session and have the system evaluate whether the agent achieved them. This is the foundation for autonomous agent loops, where the system can retry or adjust approach when the first attempt fails.
Multi-agent orchestration: First-class support for chaining agents, with built-in communication patterns and state sharing between sessions. This turns managed agents from a single-agent runtime into a workflow engine.
Persistent memory: Sessions that remember context from previous sessions. Run a code review agent 50 times on the same repository, and it builds up knowledge about your codebase, your team's patterns, and your recurring issues. This is the feature that transforms managed agents from a tool into a team member.
None of these are available in the general beta today. But they suggest that Anthropic is thinking about managed agents as a platform, not just an API. The trajectory is toward agents that are less like contractors you hire for a day and more like employees who understand your business.
When to Use Managed Agents vs. Local Claude Code
Not everything should run in the cloud. Here is our current decision framework.
Use managed agents when:
- The task takes longer than 15 minutes
- The task is repetitive and can be templated
- You need the task to run without human supervision
- You want to trigger agents from CI/CD or webhooks
- Security isolation matters (you do not want the agent on your local machine)
Use local Claude Code when:
- You are doing interactive development (pair programming)
- The feedback loop needs to be sub-second
- You need access to local files, databases, or services that cannot be replicated in a cloud environment
- The task is exploratory and you do not know the end state yet
Use both when:
- You prototype locally, then deploy the refined workflow as a managed agent
- Your managed agent triggers local Claude Code sessions for tasks that need human input
The two modes are complementary, not competing. Think of local Claude Code as your workbench and managed agents as your production line. You design and prototype at the workbench. You run at scale on the production line.
Getting Started Today
Here is the minimum viable setup to start experimenting with managed agents.
-
Get API access. You need an Anthropic API key with managed agents beta access. Add the
anthropic-beta: managed-agents-2026-04-01header to every request. -
Create a simple Agent. Start with Sonnet 4.6 (cheaper for experimentation) and a basic system prompt. Do not over-engineer the instructions on your first attempt.
-
Define a minimal Environment. Start with the default cloud environment and add packages as needed. You can always create more sophisticated environments later.
-
Create a Session and send a task. Something simple: "Read this repository and summarize the architecture." Watch the event stream to understand how the agent works.
-
Iterate. Adjust the system prompt based on the agent's output. Add tools. Customize the environment. Build up complexity gradually.
The official documentation is thorough and includes SDK examples in Python and TypeScript. Start there for the authoritative API reference.
Honest Assessment
We are genuinely excited about managed agents, and we think they represent a meaningful step forward for AI agent infrastructure. The architecture is sound. The pricing is reasonable. The developer experience is already better than managing local agent sessions for long-running tasks.
But we are also wary of hype. This is a beta product with no SLA, limited to a single model provider, and missing features (multi-agent, memory) that the most interesting use cases require. The cost analysis from Finout raises valid points about how token costs can spiral in agent workflows where the model calls itself recursively.
The contractor metaphor is useful here. When you hire a contractor, you get reliable work within a defined scope. You do not get the institutional knowledge that a full-time employee builds over years. Managed agents today are contractors: they show up, do the job, and leave. The persistent memory feature in research preview hints at the employee model, but it is not here yet.
We are building with managed agents for our CI/CD workflows, our documentation pipeline, and our automated code review process. We are not building customer-facing products on them until the SLA situation is resolved. That feels like the right balance between early adoption and prudent engineering.
The managed agent era is just beginning. The APIs will mature. The features will expand. The reliability will improve. And the teams that start building muscle memory with these tools now will have a significant advantage when the platform reaches general availability.
If you have been running Claude Code locally and wondering what comes next, this is what comes next. Not a replacement for your terminal. An expansion of what is possible when you untether AI agents from your hardware and let them run in the cloud, on their own schedule, at their own pace.
We are still early. Come build with us.