All posts
StrategyArchitectureProduction

Should You Even Build an Agent? A Feasibility Framework

The most expensive agent is the one you should never have built. A mental model for telling agent-shaped problems from workflow-shaped ones — and the ROI math that decides whether it survives at scale.

AS
AgentSwarms Authors
June 4, 2026· 16 min read·
StrategyArchitectureProduction

Gartner expects more than 40% of agentic AI projects to be canceled by the end of 2027. When you read the post-mortems, the cause is rarely the model and rarely the framework. It's that the problem was never agent-shaped to begin with. Someone had a hammer that could reason, and everything started looking like a nail.

So before any architecture diagram, the highest-leverage skill in this entire field is triage — looking at a problem and knowing whether it wants an agent, a workflow, a single model call, or just some boring deterministic code. This post is a framework for that decision. By the end you should be able to say 'no, use a workflow' with the same confidence as 'yes, build the agent' — and to defend either with numbers.

First, resist the jump to agents

There's a spectrum of automation, and agents sit at the far, expensive end. Most problems are solved better — cheaper, faster, more reliably — somewhere to the left. The instinct to reach straight for a multi-agent system is the source of most wasted budgets.

Single LLM call
Single LLM call
Classify, summarize, extract. One shot, no tools.
Capability
Cost & unpredictability
Every step right buys capability and pays in cost, latency, and unpredictability. Start as far left as the problem allows.
Slide from left to right. Each step up the ladder buys capability and pays for it in cost, latency, and unpredictability. The engineering discipline is to start as far left as the problem allows — and only move right when you've proven you have to.

Anthropic's guidance on building effective agents makes the same point bluntly: find the simplest solution possible, and only increase complexity when it demonstrably improves outcomes. A single well-prompted LLM call with retrieval solves a startling fraction of 'we need an agent' requests. The agent is the answer only when the path itself is unpredictable.

The core trade: determinism for flexibility

Here's the one sentence to anchor on: an agent trades determinism for flexibility. A workflow does the same thing every time — predictable, debuggable, cheap. An agent decides what to do at runtime — flexible enough to handle messy, open-ended tasks, but harder to predict, test, and bound. You take that trade only when the flexibility is worth more than the determinism you're giving up.

That single trade explains the whole feasibility question. If a problem has a knowable, fixed path, paying for an agent's flexibility is pure waste — you bought unpredictability you didn't need. If the path genuinely varies per input and can't be enumerated in advance, a rigid workflow will keep falling off the edges, and the agent's flexibility earns its cost. Two dimensions decide it.

Cost of error →
Path variability →
Great agent candidate
Variable path, forgiving of error → the sweet spot. Flexibility pays off and mistakes are cheap to catch.
e.g. Draft a research brief; brainstorm campaign ideas.
The feasibility 2×2: how variable is the path, and how costly is a mistake? Click each quadrant. Only one of the four is a clean 'build an agent' — and the high-stakes corner needs guardrails and a human before it's anything at all.
Read the quadrant honestly

Teams love to place their use case in the top-right 'sophisticated' corner because it feels ambitious. Most real problems live bottom-left (just automate it) or top-left (workflow + validation). Be ruthless: the goal is the cheapest thing that works, not the most impressive thing you can justify.

The candidate scorecard

The 2×2 gives you intuition; this gives you a checklist. A problem is a genuine agent candidate when most of these are true at once. Toggle them and watch the verdict move — and notice how quickly 'it would be cool' collapses into 'use a workflow' when the signals aren't there.

0 / 6 signals
Don't build an agent
A workflow, a single LLM call, or plain code will be cheaper, faster, and more reliable.
Toggle the signals that are true for your use case. Five or more and the flexibility of an agent earns its keep. Two or fewer and you're about to over-engineer something a simpler tool would nail.

Two of these signals deserve special weight. 'There's a way to verify the output' is close to a hard requirement — an agent you can't check is an agent you can't trust at scale, because you'll never know when it quietly went wrong. And 'wrong answers are recoverable' is what keeps you out of the danger zone: an agent acting irreversibly with no review is a risk, not a feature, no matter how capable it is.

When the answer is just 'no'

Red flags that should stop a project

High-volume, perfectly deterministic transactions (use code). Tasks requiring 100% accuracy with no verification step (the model will eventually be confidently wrong). Hard real-time, sub-100ms latency budgets (agents loop and think — they're slow). Trivially simple single-step tasks (a single LLM call or a regex wins). Zero-tolerance regulatory actions with no human in the loop. If your use case is mostly these, the feasible answer is 'not an agent'.

Saying no here isn't pessimism — it's what makes the yeses credible. A team that has a clear list of things agents are bad at is a team you can trust when they say a particular problem is a fit.

The ROI math nobody runs until it's too late

A use case can be perfectly agent-shaped and still lose money at scale. Feasibility isn't just 'can an agent do this?' — it's 'does it pay?' once you multiply by volume. The economics are simple enough to put in a calculator, and sobering enough that most teams should run it before the build, not after the invoice.

Tasks per month20,000
Agent cost per task (¢)8¢
Success rate85%
Value per successful task$4
Monthly value created
$68,000
Monthly agent cost
$1,600
Net / ROI
$66,400 (+4150%)
Above water — but stress-test the success rate; it's the variable that flips this.
Move the sliders to your numbers. Value created = volume × success rate × value per success. Cost = volume × cost per task. The success rate is the variable that quietly flips the whole thing — drag it down a few points and watch a 'no-brainer' go underwater.

Notice what the calculator teaches: at low volume almost anything pencils out, which is why pilots look great. At high volume, two things dominate — your cost per task (driven by model choice, steps, and retries) and your success rate (because failures are tasks you paid for and have to redo or escalate). A pilot that looked profitable at 500 tasks a month can invert at 200,000.

The accuracy trap that ambushes you at scale

Here's the one that ambushes good teams. A single step that's 95% accurate sounds excellent. But agents chain steps, and accuracy compounds — multiplies — down the chain. Ten steps at 95% each isn't 95%; it's 0.95 to the tenth power, about 60%. The model didn't get worse. The chain did.

Steps in the chain6
Per-step accuracy92%
End-to-end success (92%6)
60.6%
At 10,000 runs/day that's 3,936 failures — every single day.
"95% accurate" feels great until you chain it. Long autonomous chains at scale need verification gates, not just a good model.
Set the chain length and per-step accuracy, then read the end-to-end success — and what it means at 10,000 runs a day. This is why long, fully-autonomous agent chains at scale are a verification problem, not a model problem.
What this implies for design

If your honest per-step accuracy is in the low 90s, you cannot run a long autonomous chain at scale without verification gates between steps — a checker, a grader, a human checkpoint — to stop errors from compounding. Either keep chains short, add verification, or pick a use case where a 60% end-to-end success is genuinely acceptable.

Where agents actually win at scale

None of this is an argument against agents — it's an argument for aiming them well. When the math works, agents win in three recognizable ways: they augment expensive human experts (drafting, research, triage that a person then approves), they deliver consistency at a volume humans can't sustain, and they unlock tasks that were simply uneconomical to do by hand at all. The common thread is that the value per task is high enough, and the work variable enough, that flexibility beats a script.

And the smart way to get there is not to bet the company on full autonomy on day one. It's to climb a ladder.

Human in the loop on every action. The agent drafts, suggests, retrieves; a person approves. You measure quality and build trust with zero blast radius.
Crawl, walk, run. Start assistive with a human approving every action; earn autonomy on the slice you've proven; scale only the validated path. The teams that skip straight to 'run' are the ones in Gartner's cancellation statistic.

The whole framework in one decision

Put it together and the feasibility question collapses into a short walk: Is the path fixed? Can you verify the output? Is the task worth the cost? Answer those honestly and the recommendation falls out — often it's 'use a workflow', sometimes it's 'not yet', and exactly when it should be, it's 'build the agent'.

Is the path to the answer the same every time?
Walk the tree for a real use case. Most paths don't end at 'build the agent' — and that's the point. The ones that do are the projects worth your team's time.
Prototype the decision, don't argue about it

The fastest way to settle 'is this agent-shaped?' is to build the smallest version and measure it. In AgentSwarms you can stand up a swarm on the visual canvas, run it against real inputs, and watch the traces and cost — then decide with evidence instead of opinions. Cheap to try, cheaper than a canceled six-month project.

Agentic AI is genuinely transformative for the problems it fits — and a money pit for the ones it doesn't. The framework above won't pick the problem for you, but it will stop you from building the agent you'll regret. In a field where 40% of projects get canceled, knowing which 60% to start is most of the battle.


Was this useful?

Comments

Sign in to join the discussion.

Loading comments…