Agentic Engineering: The Complete Guide for 2026

19 min read

May 1, 2026

It happened on a Tuesday in March, around 6:14am, and it is the moment I started calling what I do agentic engineering instead of vibe coding.

I was still in bed. My phone buzzed once. A pull request had opened on a side project I had not touched in nine days. I squinted at the notification. The PR title said "chore(deps): bump three minor versions; flake in checkout-flow.test.ts patched." The author was a routine I had set up two weeks earlier and then forgotten about. I had not written a line of code. I had not reviewed a line of code. I had not been asked anything.

I sat up. I read the diff on my phone. It was clean. The routine had run the dependency audit, found three safe bumps, applied them, run the test suite, watched the same flaky checkout test fail twice, identified the race condition, patched the test, re-run, gone green, opened the PR with a written summary, and waited politely for me to wake up.

I had set up the conditions, weeks ago, under which this work could happen, and then the work happened.

That is the moment the gear shifted. Not theoretically. Physically. I felt it in my chest the way you feel a clutch engage. I was not vibe coding. I was not even coding. I was directing the conditions under which code happens. This guide is the long version of what I wish someone had handed me eighteen months ago.

What "agentic engineering" actually means

Let me draw three lines, because the words around this are getting muddy and the muddiness is hurting people.

Vibe coding is what Andrej Karpathy named in early 2025 on X. You describe what you want in natural language, the agent writes the code, you accept or edit. The unit of work is the feature. The skill is taste, prompt phrasing, and knowing when to say no. If you want the canonical version of that idea, I wrote about it in what is vibe coding and Karpathy himself updated his thinking in late 2025, which I unpacked in Karpathy: vibe coding is passe, agentic engineering is the new thing.

Prompt engineering is the older cousin. You optimize the words that direct a single inference. The unit of work is the prompt. The skill is linguistic precision, role-play setup, and few-shot construction. Useful, still, but it is a craft about a moment, not a system.

Agentic engineering is the new thing. You design the system of agents, tools, hooks, and feedback loops that produce code, and maintain it, and ship it. The unit of work is no longer the prompt. It is the agent plus its environment plus its constraints plus its hooks plus its memory plus the routines that wake it up. You are not writing prose for the model. You are writing a job description, a wiring diagram, and a set of operating procedures for a small team that does not sleep.

The shift is real and it is felt. When I am vibe coding, I am close to the keyboard, watching every line stream in. When I am doing agentic engineering, I am often nowhere near a keyboard. I am sketching a PR review checklist on a napkin in a coffee shop, then later transcribing it into a sub-agent's system prompt, then setting a hook to invoke that sub-agent on every pull request. The prose I write does not produce code. It produces the conditions under which code is produced. That is a different job.

I want to be careful here, because I have watched people get this wrong in both directions. Some folks hear "agentic engineering" and think it means more autonomy, fewer humans. That is not quite right. Other folks hear it and think it means building a fancy agent framework from scratch. Also not quite right. Agentic engineering is the discipline of designing the conditions under which agents do useful work safely and repeatedly. The agents already exist. The frameworks already exist. The question is whether you can compose them into something that ships software your team can stand behind.

The spectrum, in plain prose

I keep a mental ladder. It is not a maturity model in the consultant sense. It is just a way of locating yourself on a Tuesday morning. There are six rungs, and most teams I talk to are smeared across two or three of them at once.

L0 is no AI assistance. You write everything. Spell-check is the only intelligence in your loop. Some of the best engineers I know still operate here for specific code paths, on purpose, because the cognitive load of supervising a model exceeds the cognitive load of just typing.

L1 is autocomplete. Copilot suggesting tokens. The model finishes your sentence but you are still the author. The unit of work is the line. You feel productive. You are mostly correct that you are productive. The risk profile is low because your eyes are on every character.

L2 is vibe coding. You describe a feature, the agent drafts the implementation, you accept or edit. The unit of work is the feature. You are an editor now, not a typist. The risk profile shifts: you can ship more, faster, and you can also ship subtler bugs faster. Most of the developer world spent 2024 and 2025 here.

L3 is agentic coding. You direct multi-step agents to complete tasks. Plan mode. Sub-agents. Parallel runs across worktrees. The agent goes off, works for ten minutes, comes back with a plan, you approve, it goes off again. The unit of work is the task, often a multi-file change. You are a supervising engineer of one or two agents at a time. This is where the 50 Claude Code tips start to compound, because each tip saves you a step in a workflow you are now running ten times a day.

L4 is agentic engineering. You design the agent system itself. Hooks fire on commits. Routines run on schedules. Custom sub-agents have job descriptions. Skills are versioned and shared like internal libraries. The team is half-AI by the org chart and you can name each agent the way you would name a teammate. The unit of work is no longer the task. It is the throughput of the system. You are measured by how much shippable software the system produces per week, not how much code you personally wrote.

L5 is autonomous engineering. You set goals; the system handles everything between goal and shipped product. We are not here yet. We are getting closer than I expected. I have seen narrow slices of L5 working in production: a customer support tool that triages, drafts responses, escalates, learns, and improves itself with no engineer in the loop for weeks at a time. But for general-purpose software engineering, L5 is still a research target. I would bet 2027 or 2028 for credible L5 in narrow domains. I would bet later, maybe never, for general L5. I genuinely do not know. Anyone who tells you they do is selling something.

The honest answer about where you should be: most engineering teams I respect are operating somewhere between L3 and L4, with L2 still the default for solo work. The leap from L3 to L4 is the one this guide is really about, because it is the leap that requires engineering in a way the earlier rungs do not.

The toolkit

Five components. They compose. None of them is sufficient alone. All of them together is what I mean by agentic engineering.

a) Skills

Skills are reusable capability packages. The simplest framing: a skill is a Markdown file plus a tool list plus an invocation pattern. The Markdown describes when to use the skill, what inputs it expects, what tools it should reach for, and what success looks like. You drop the file in a known directory and the model knows how to use it.

I treat skills exactly the way I treat internal libraries. They get version numbers. They get reviewers. They get tests, sort of, in the form of evaluation prompts I run before promoting a skill to "approved for production." They get deprecation notices when better skills replace them. The discipline of skill authorship is unfamiliar to engineers who have only ever written prompts, because it forces you to think about interfaces rather than instructions. A skill is not "do this thing for me right now." A skill is "here is a reusable capability that any agent in our system can invoke when the situation calls for it."

The packages I keep coming back to are the official Claude plugins and a handful of community skill packs like superpowers. The plugin ecosystem is doing for skills what npm did for JavaScript: making it possible to compose a working agent system from twenty things other people built and three things you built yourself. That ratio matters. It is the same ratio as a healthy software project.

A small mental shift: stop writing prompts inline. Start writing skills. The first time you write the same prompt twice, extract it. The second time you tweak it for a different repo, version it. The third time you wish someone else on the team could use it, publish it. Skills are how a team's tacit knowledge about working with agents becomes explicit, shared, and improvable. Without them, every engineer reinvents every wheel every Monday morning. With them, you get compounding returns the way you do with any well-maintained internal library.

b) Hooks

Hooks are the system's nervous system. Pre-tool-use, post-tool-use, session-start, session-stop, on-edit, on-commit. Each hook is a small program that runs at a known moment in the agent's lifecycle and can observe, modify, log, or block what is about to happen.

I run formatters on post-edit. I run a security scan on pre-commit. I log every tool call to a structured telemetry file so I can replay sessions weeks later. I block the agent from writing to certain paths because some files in our repo represent decisions that need a human meeting, not a generated patch. I send a notification to my phone when an agent in a long-running session has been idle for more than a minute, because that almost always means it is stuck waiting on input I forgot to provide.

The instinct most engineers miss is that hooks are cheap to add and cheap to remove. You do not need a strategy. You need a habit. Every time you find yourself wishing the agent had done something automatically, write a hook. Every time you find a hook getting in the way more than it helps, delete it. The full pattern library, with code, lives in the Claude Code hooks complete cookbook. I will not repeat it here. I will say only that the cookbook is the post on this site I personally re-read most often, because hooks are the layer where my own system is least mature and has the most room to grow.

The dangerous failure mode with hooks is over-engineering. You can build a Rube Goldberg machine of pre-flight validators, post-flight loggers, and middle-flight transformers that nobody on the team understands six months later. The cure is the same cure as for over-engineered code: keep them small, name them clearly, document them lightly, and delete the ones that have not earned their keep.

c) Sub-agents

Sub-agents are specialist agents you dispatch via the Agent tool. They are not just "another conversation." They are jobs you hand off. You give them a system prompt, a tool list, a model selection, and a task description. They go off and work and return a result. You can dispatch many of them in parallel.

Model selection is the part most people get wrong. Haiku is dirt cheap and fast and is exactly right for parallel research, file reading, and simple transformations. Opus is expensive and slow and is exactly right for architectural decisions, hard debugging, and anything where a wrong answer costs more than a few extra dollars. Sonnet is the workhorse for everyday tasks. The orchestration is yours. You decide which agent gets which job.

I have a mental rule: if a task can be parallelized into more than three independent units, use Haiku sub-agents and accept that one of them will be slightly wrong. For a practical guide to building these, the Claude Code custom agents guide walks through the full setup. The savings in wall-clock time and cost are too large to ignore. If a task is irreducibly sequential and the cost of a wrong answer is high, use Opus and accept that you will pay for the privilege of slow careful thought. Almost everything else, default to Sonnet.

The other shift: stop thinking of sub-agents as delegation and start thinking of them as manufacturing. A sub-agent is not your assistant. It is a small factory unit you have configured for a specific task. You feed it inputs, it produces outputs, you spec-check the outputs. The mental model of "delegation" makes you frustrated when sub-agents fail to use judgment. The mental model of "manufacturing" makes you write better specs.

d) Routines

Routines are scheduled and event-driven agent runs. The dependency audit at 6am. The PR triage at 9am. The weekly release-prep at Monday noon. The on-incident triage that fires when a Sentry alert crosses a threshold.

Routines are where agentic engineering stops feeling like coding and starts feeling like running a small operation. You are not asking the agent to do something. You are setting up a schedule under which the agent does something. The difference, in practice, is enormous. A scheduled routine is a teammate. An ad-hoc agent invocation is a tool.

I cover the full pattern in the Claude Code routines complete guide, but the headline insight is this: the value of a routine compounds, because every morning you wake up to yesterday's work already done. The latency between "I noticed a problem" and "the system handled it" goes from days to hours to minutes. That latency reduction is, frankly, the most underrated productivity gain I have experienced in this entire era. It is not "I write code faster." It is "I never start the day from zero."

The thing nobody tells you is that routines are easy to set up and hard to trust. The first month of any new routine, you check its work obsessively. The second month, you spot-check. By the third month, if the routine has earned trust, you read its summaries and approve. If it has not earned trust, you delete it and try a different design. Trust is the limiting reagent in agentic engineering. Without it, you stay at L3 forever.

e) Managed agents (Claude API)

The local terminal is wonderful until it is not. Some agents need to run for hours. Some need to outlive your laptop's battery. Some need shared state across multiple human operators. Some need to run inside your VPC with strict tool restrictions. For all of those cases, you reach for managed agents.

Managed agents run server-side, with tools, memory, and budget caps you configure. They can be triggered by webhooks, schedules, or other agents. They can persist memory across runs. They can be observed and audited via a console. They are the escape hatch when your local Claude Code session is the wrong tool for the job. I wrote a longer guide at the Claude managed agents API guide.

I use managed agents for three things: long-running research tasks that would tie up my terminal for an hour, customer-facing automations that need to be available even when I am asleep, and high-trust workflows where I want budget caps and audit trails out of the box. For every other use, local Claude Code is faster to iterate on and easier to reason about. Pick the right tool for the right scope. The temptation to put everything on the server side is real and usually wrong. Latency matters. Iteration speed matters. Local-first is still the default for most agentic engineering work.

The mental model shift

There are four shifts that, taken together, are the whole game.

You stop optimizing prompts. You start optimizing systems. The prompt is still important, but it is one input among many. The shape of the system, the wiring of the hooks, the choice of which sub-agent runs where, the schedule of the routines: those are now where the leverage lives. A 5% better prompt inside a 30% worse system is a worse outcome.

You stop reading every line. You start reading every PR. The granularity of your attention moves up one level. You skim diffs. You read summaries. You drill into specifics only when something looks off. This is uncomfortable for engineers who built their identity on careful reading. It is necessary for engineers who want to ship the throughput their system is now capable of producing.

You stop estimating tasks. You start estimating throughput. The question is no longer "how long will this feature take?" The question is "how many features per week is the system shipping, and what is the bottleneck?" The whole language of engineering planning shifts from unit of work to flow of work. This will feel weird the first time you do it in a sprint planning meeting. It is the right shift.

You stop "knowing the codebase." You start knowing the agent system that knows the codebase. Your value to your team is no longer your encyclopedic knowledge of where every function lives. Your value is your knowledge of which agent to dispatch, with which skill, under which constraints, to find and modify the right thing. The codebase becomes the agent's territory. The agent system, starting with a well-written CLAUDE.md rules file, becomes yours.

I want to be honest: this shift is not painless. I miss the feeling of fully holding a codebase in my head. I genuinely do. There was a kind of intimacy with software that I have traded away, and I notice the loss on the days when nothing is shipping and I am just reading agent summaries. But the trade has been worth it for me, because the system ships more in a week than I shipped in a month two years ago. Your trade may be different. That is fine. There is no requirement to be at L4 to be a good engineer. There is only the requirement to be honest with yourself about what you are trading and why.

A real day in agentic engineering

Last Tuesday, in actual order:

6:00am. The dependency-audit routine finished overnight. Four PRs are waiting for review. Two are obviously safe. One is interesting. One is a security patch I want to look at carefully.

8:00am. Coffee. I read the PR summaries the review agent wrote on its own pass through the queue. I approve three. I kick one back with a comment because the agent missed that we had committed to a specific minor version due to a customer contract. The agent updates the PR within ninety seconds. I approve it. Total active time: maybe seven minutes.

9:00am. I plan a new feature with the /ultraplan workflow. A twelve-minute cloud session. The plan comes back with thirteen steps and identifies four migrations that are independent and parallelizable. I read the plan. I edit two steps. I approve.

9:30am. I dispatch four sub-agents in parallel for the four independent migrations. Each one gets a worktree, following the parallel agents and worktrees pattern. Each one gets a budget. Each one gets a system prompt that includes our internal style guide and the specific files it is allowed to touch.

11:00am. Spec-review the outputs. Two pass cleanly. Two need rework: one missed a test case, one used a deprecated helper. I write feedback. Re-dispatch. The two re-runs return clean by 11:40.

12:00pm. Lunch with my partner. Before I close the laptop, I dispatch a documentation agent in the background to update our architecture docs based on the morning's PRs.

2:00pm. A hook fires on a security scan. The agent flagged a CVE in one of our transitive dependencies and proposed a patch. I read the patch. The patch is wrong. The CVE is real but the patch addresses a symptom rather than the root cause. I fix it manually because the right fix requires understanding the threat model in a way I do not trust the agent to do alone yet. Forty-five minutes of human-only work. The kind of work I love.

4:00pm. I set up a new routine for next week's release prep: gather changelogs, draft release notes, build the artifacts in staging, run smoke tests, post a summary to Slack. About thirty minutes of skill-and-hook configuration. The routine will run every Monday at 10am from now until I delete it.

5:00pm. I close the laptop. The agents keep working. The docs agent will finish around 6pm. The dependency audit will run again at 2am. Tomorrow morning I will wake up to whatever the system produced overnight.

That is a day. I shipped, by traditional accounting, the equivalent of two senior engineers worth of code. I personally typed maybe four hundred lines. The rest was the system.

What this means for engineering careers

I will not pretend to know exactly how this plays out for everyone. I will tell you what I am seeing.

Senior engineers are becoming system designers. The job is no longer "implement this feature." The job is "design the agent configuration that will reliably produce features of this shape." The skills that compound are: skill authorship, hook design, sub-agent orchestration, evaluation literacy, and the taste to spec-check outputs at speed. The skills that are deprecating fastest are: rote framework knowledge, manual code translation between languages, and the ability to type quickly. The latter were never the point. They felt like the point. They were always proxies for clearer thinking.

Junior engineers entering the field today have an uncomfortable but real opportunity. The path of "learn React, learn Node, learn Postgres, learn the stack" is not gone, but it is no longer the differentiator it was. The differentiator is learning agentic engineering as a first-class discipline. A 22-year-old who can wire up a sub-agent system that ships features reliably is more valuable to a startup than a 28-year-old with five years of React experience and no agentic literacy. I have seen this in hiring. I have made these hires.

The new "10x engineer" is the one who runs ten agent systems well. Not ten agents. Ten systems. Each system is itself a coordinated set of agents, hooks, skills, and routines. The leverage is no longer "how fast can you type" or even "how fast can you think." It is "how many independent value-producing loops can you operate simultaneously without dropping any of them." I know engineers running fifteen to twenty such systems. I know what they are paid. The pay band is stretching, fast, in ways the org charts are not yet ready for.

I will say the quiet part: the top 10% of agentic engineers will likely earn what executives earn within three to five years. Not because they deserve it more than other engineers. Because the leverage they wield is operationally indistinguishable from running a small business. If one person can ship the output of twenty engineers, the market will eventually price that. The market is slow. It is not infinitely slow.

For everyone else, the median engineering salary will probably be roughly stable in real terms for a while, while the content of the job changes underneath. That is not catastrophe. It is also not the upside narrative the AI labs prefer. It is what I think is most likely to happen, and I want you to hear it from someone who is not selling you a course.

Where this fails

I want to spend real time here, because the rest of this guide reads more confident than I actually am about most days.

The setup tax is enormous. You do not get to L4 in a week. You probably do not get to L4 in a quarter. The first time you try to design a real agent system, you will spend more time on the system than you would have spent just writing the code. This is correct and unavoidable and the source of most failed agentic-engineering projects. The investment compounds, but only if you stay long enough to receive the compounding. Most teams give up in the first month, when the system is still slower than just typing. I almost did.

The mental model is foreign. Engineers who have spent twenty years optimizing for "I understand every line in this codebase" find it actively painful to operate at the agent-system layer. There is grief here. I have watched senior engineers I deeply respect bounce off this transition because the texture of the work no longer matches what made them love the job. That is a legitimate response to a real change. Some of them have moved into pure architecture roles. Some of them have moved into research. Some of them have moved out of the industry. None of those choices are wrong.

You will build elaborate systems that do not work and have to tear them down. I have done this twice. The first time, I built a baroque routing system across seven sub-agents that turned out to be slower and worse than just calling Sonnet directly. The second time, I built a hook system so aggressive that it blocked the agent from doing anything productive and I spent a week debugging my own safety net. Tearing down a system you built is harder than tearing down code, because the system represents weeks of design thinking. Do it anyway.

Burnout looks different but is just as real. The traditional engineering burnout is "I have been writing code for sixteen hours and my brain hurts." The agentic burnout is "I have been spec-reviewing agent outputs for sixteen hours and I cannot remember what any of them did." The cognitive load shifts from production to evaluation, and evaluation is not free. Some days I miss the simplicity of just writing code. Many days, actually. The system produces more, but the act of supervising the system is its own kind of labor.

Some problems still need a person. Calmly. Alone. At a keyboard. The CVE patch I described above. The architectural decision that requires understanding three years of organizational history. The customer crisis at 11pm where the right thing is empathy, not throughput. The agent system is not a replacement for that work. It is a frame around that work that lets you do more of it because you have offloaded the rote stuff. If you find yourself never doing the human-only work anymore, something has gone wrong. The system has eaten the part of the job that was the actual job.

There is also a subtler failure mode I want to name. Skill atrophy. If you stop writing code entirely, you lose the felt sense of what is reasonable. You start approving PRs that no human should approve because you have lost the calibration. The agents are good but they are not infallible, and you are the last line of defense. I now deliberately write code by hand for one to three hours every week, on whatever feels like the most interesting problem available, just to keep the calibration alive. Treat it like a musician practicing scales. The performance is the agent system. The scales keep you able to evaluate the performance.

Closing

The shift is happening whether or not you are ready. I do not think this is hype. I think it is the actual thing that is reorganizing the labor market for software engineering, on roughly the timeline I described, with roughly the implications I described. I have been wrong about plenty of trends in this industry. I do not think I am wrong about this one, but I have been wrong about plenty before, so weigh that.

You do not have to be at L5 tomorrow. You probably should not be. L5 is unsafe for most of what most teams build. You have to be moving. You have to be moving from L1 to L2 if you are at L1. From L2 to L3 if you are at L2. From L3 to L4 if you are at L3. The specific tactics will change every six months. The direction will not.

If I could give you one practical first step, it would be this: pick one piece of toil in your week, something you do every Monday or every morning, and turn it into a routine. Just one. Not a whole agent system. One scheduled task that runs without you. Watch it for a month. Adjust. Then pick a second piece of toil. Then a third. The system grows from those. You do not design L4 from a whiteboard. You compost your way into it, one routine at a time.

I am still figuring this out. Some weeks I feel like I have unlocked a new way of working. Other weeks I feel like a confused project manager for a team of strangers who keep handing me half-correct work. Both are true. I am writing more about it as I learn. I would love to hear what is working in your system, and what is not, and what you have torn down twice and built again. The conversation continues. I will be here.

Share on X LinkedIn