---
title: "Agentic-First Development: Build Software Agents Can Actually Use"
date: 2026-05-29
author: Matt
url: https://www.mattwarren.co/2026/05/agentic-first-development/
---

# Agentic-First Development: Build Software Agents Can Actually Use

Most software teams are about to run into a weird problem: they are going to ask for “agent support” and nobody is going to know what that means.

Not the product manager. Not the developer. Not even the AI model doing half the implementation.

The word agent is still too fluid. Sometimes it means a chatbot. Sometimes it means a background job with an LLM call in the middle. Sometimes it means a coding assistant with tools. Sometimes it means a workflow that can make decisions, call APIs, and keep going without a human clicking every button.

That makes it hard to build software for agents.

If you tell a coding agent, “make this more agentic,” you will often get something vague. Maybe it adds an API endpoint. Maybe it adds a prompt. Maybe it writes a README. Maybe it creates a single hard-coded instruction like “analyze this data and return insights.”

That is not enough.

The better question is more practical:

How would an agent actually use this feature five minutes after it was built?

That question changes the development loop.

## The old loop is too human-centered

Most application development still assumes a human user.

You build a feature. You open the browser. You click around. You see if the form works. You check the page. You fix the obvious bugs. Eventually you may add API endpoints, documentation, admin commands, or automation hooks around the edges.

That works if the primary user is a person with a screen, patience, and enough context to infer what the software is supposed to do.

Agents are different.

An agent does not “just know” that the button in the top-right corner starts the workflow. It does not automatically understand which endpoint is safe, what order commands should run in, which fields are optional, or how to recover from a partial failure.

A human can poke around and build a mental model.

An agent needs that mental model handed to it.

That is why [building AI-operable systems](https://www.mattwarren.co/2026/01/claude-code-first-development-building-ai-operable-systems/) is not just about exposing a few commands. The software has to be shaped so an agent can discover it, operate it, verify its work, and recover when something goes wrong.

## Agentic-first development means testing with an agent immediately

The loop I have been using is simple:

1. Build the feature.
2. Build the API, CLI command, or skill that describes how an agent should use it.
3. Ask an agent to use the feature immediately.
4. Watch where it gets confused.
5. Improve the feature, the interface, and the skill together.

The important part is timing.

Do not wait until the feature is “done” and then bolt agent support onto it later. The best feedback happens while the implementation context is still fresh. The model knows what was just built. The developer knows what tradeoffs were made. The rough edges are still visible.

So after a feature lands, ask the agent something like:

> Use the feature you just implemented as if you were an agent trying to complete a real task. Do not explain how it should work. Actually exercise it. Tell me where the interface, API, command output, docs, or skill file made your job harder.

That prompt surfaces a different class of bug.

Not just “does the endpoint return 200?”

More like:

- Could the agent find the right command?
- Did the command output include enough context?
- Was the failure mode understandable?
- Did the skill explain the safe path?
- Was there a dry-run option before a destructive action?
- Did the agent know how to verify success?
- Did the API require hidden knowledge from the developer’s head?

That is the kind of feedback normal QA often misses.

## The skill is part of the feature

This is the part that took me a while to appreciate.

If an agent is going to use your software, the skill file is not documentation after the fact. It is part of the product surface.

A human-facing feature might include:

- UI
- copy
- error messages
- onboarding
- docs

An agent-facing feature needs its own equivalent:

- tool descriptions
- command examples
- safe operating rules
- expected outputs
- verification steps
- known pitfalls
- recovery instructions
- examples of good and bad usage

That context is not decorative. It is how the agent becomes competent.

This is the same pattern behind [bring your own agent](https://www.mattwarren.co/2026/04/bring-your-own-agent/). The useful part is rarely one magical prompt. The useful part is the accumulated operating system around the work: the tools, memory, examples, rubrics, and habits that let an agent perform reliably in a specific environment.

So when you build a feature, build the skill at the same time.

If the feature has an admin command, the skill should explain when to use it, which flags matter, what a successful result looks like, and what to do if it fails.

If the feature has an API, the skill should show real request and response examples.

If the feature triggers background work, the skill should explain how to check job status, inspect logs, and retry safely.

If the feature can modify production data, the skill should include dry-run behavior and warnings.

The skill is the agent’s onboarding document.

Write it like the next user has no memory of the conversation that created the feature. Because usually, that is exactly what will happen.

## Fresh-context testing is where this gets useful

The first version of this loop can happen in the same chat where the feature was built.

That is useful, but it is not enough.

The agent that just wrote the feature has a huge amount of implicit context. It remembers the architecture discussion. It remembers the files it edited. It remembers the assumptions. It may know how to use the feature because it just created it.

A real future agent will not have that advantage.

A better test is to create an isolated subagent and give it only the skill plus the repo or application access it would normally have.

Then ask it to complete a task.

For example:

> Spawn a fresh subagent. Give it the new skill and no extra explanation. Ask it to use the feature to accomplish a realistic task. Report where it got stuck, what context was missing, and what would have made the skill easier to use.

That isolation is valuable.

It simulates the reboot problem. It shows what happens after the development context disappears and all that remains is the actual product surface you created for agents.

If the subagent can use the feature from the skill alone, you probably have something durable.

If it cannot, that is not a failure of the agent. That is product feedback.

## Build the context harness, not just the prompt

There is a second side to agentic-first development: building software that contains agents as features.

This is where the terminology gets even less helpful.

If you ask a coding agent to “add an agent that audits this data,” it may create a function with a prompt like:

> Analyze this data and return recommendations.

That is not an agent. That is a sentence.

A useful internal agent needs the same thing a new employee would need: context, instructions, examples, constraints, access to the right data, and a definition of good work.

The framing that works better is this:

> Build this as if we are delegating the task to a smart junior employee who is new to the company. Give them enough context to act like a senior employee. Do not be afraid of a large prompt. Use dynamic, templated context pulled algorithmically from the application where possible.

That produces a different implementation.

Instead of a tiny hard-coded prompt, you start building a context harness:

- current user or account context
- relevant records from the database
- recent activity
- available tools and actions
- policy or brand rules
- examples of high-quality output
- domain-specific terminology
- constraints and safety rules
- scoring rubric or acceptance criteria
- exact output schema

Some of that context should be static. Some should be templated. Some should be pulled from the application at runtime.

The important distinction is that context assembly should be mostly algorithmic, not another vague agentic step. The software should know where to retrieve the account details, prior activity, relevant documents, and configuration. Then the LLM receives a rich work packet instead of a bare instruction.

Modern LLM context windows can hold a lot. That does not mean you should dump everything in blindly, but it does mean developers should stop being afraid of giving the model enough information to do the job well.

A one-sentence prompt is rarely delegation.

It is more like shouting a task at somebody as they walk past your desk.

## Treat agents like employees, not magic functions

The employee analogy keeps helping.

If you hired a junior employee and said, “audit this customer account,” you would not expect great work unless you also gave them:

- what the company does
- what the customer is trying to accomplish
- what tools they can use
- what good and bad accounts look like
- where to find the data
- what risks to watch for
- what format the answer should take
- when to escalate

Agents need the same treatment.

This does not make them human. It just gives developers a better abstraction.

A function needs parameters.

An agent needs context.

A function returns a value.

An agent performs work against a goal, using judgment inside constraints.

If you build the feature as if the agent were a function, you tend to under-specify the work. If you build it as if you were delegating to a new employee, you naturally include the missing context.

That is the difference between “LLM integration” and an actually useful agentic feature.

## Two kinds of agentic-first software

There are really two related ideas here.

The first is software an agent can use.

That means APIs, CLI tools, skills, structured outputs, safe commands, dry runs, logs, and verification steps. It means the application is operable by an AI assistant in the same way it is operable by a human.

The second is software that uses agents internally.

That means LLM-powered audits, cleanup jobs, recommendations, workflows, summaries, monitors, and decision-support features. It means the application itself can delegate pieces of work to model-driven components.

Those are different design problems, but they reinforce each other.

When you make a feature easier for an external agent to operate, you often make it easier for your internal agents to call safely too. When you build richer internal context harnesses, you often expose clearer concepts that external agents can use.

This is where [agent teams and adversarial review loops](https://www.mattwarren.co/2026/02/adversarial-agents/) become more than a creative workflow. The same idea can apply inside software development: one agent builds, another exercises, another reviews, and the product improves because the agents are forced to use the thing rather than merely describe it.

## What this looks like in practice

A practical agentic-first feature might ship with a checklist like this:

1. Human UI works.
2. API endpoint exists.
3. CLI command or tool wrapper exists.
4. Command output is structured and includes enough context.
5. Errors include recovery suggestions.
6. Destructive actions support dry-run.
7. A skill explains how and when an agent should use the feature.
8. A fresh subagent can complete a realistic task using only that skill.
9. The built-in LLM prompt uses a rich context harness, not a one-line instruction.
10. Verification steps prove the work succeeded.

That checklist is not complicated.

But it changes what “done” means.

Done is no longer only “a person can click it.”

Done becomes “a person can click it, an agent can operate it, and the application’s own agents have enough context to use it intelligently.”

## The terminology will catch up later

Right now, everyone is still inventing words for this.

Agentic support. AI-native software. Agent-operable systems. LLM-powered workflows. Human-in-the-loop automation. Delegated intelligence.

The names are messy because the category is still forming.

That is why the practical loop matters more than the vocabulary.

Build the feature. Build the skill. Ask an agent to use it. Isolate the test. Watch where it fails. Improve the interface and the context harness. Repeat.

That loop creates better software even if nobody agrees on the perfect terminology.

The applications that win in the next phase will not just have AI sprinkled on top. They will be designed around the reality that some users are human, some users are agents, and some features are agents doing work on behalf of both.

That is what agentic-first development means in practice.

Not “add an agent.”

Build the application so agents can understand it, operate it, and meaningfully participate in the work.