Künstliche Intelligenz

Cognition Devin Fusion: Multi-Model Coding Agent Harness

Cognition's Devin Fusion runs a frontier planner and a cheaper sidekick model in parallel, cutting agent coding cost 35% without breaking the token cache.

İlker Ulusoy 2026-07-02 10 min Min. Lesezeit

On June 29, 2026, Cognition released Devin Fusion, a hybrid-model harness for agentic coding that keeps a frontier planner in the loop and hands bounded subtasks to a cheaper "sidekick" model — cutting the cost of Fable 5-level intelligence by roughly 35% without breaking the token cache. For teams that ship mobile apps, n8n workflows, and AI agent orchestration, like the ones we build at Halmob, Fusion is the first production example of a routing pattern that stops paying the "switch models mid-session" tax.

The signal in the day's smol.ai AINews issue was not the headline benchmark. It was the shape of the harness. Fusion does not choose one model per task; it runs two at once, gives the cheap one a defined lane, and lets the smart one review or take over when the work gets harder. That is a routing decision made continuously inside a session, and it is the piece most teams have been missing.

The 30-Second Version

Devin Fusion runs a frontier model (Fable 5 or Opus/GPT-5.5 class) and a cheaper sidekick model in parallel inside one coding session. The frontier model plans; the sidekick executes bounded subtasks the frontier delegates. Because the sidekick lives alongside the frontier instead of replacing it turn by turn, the token cache stays warm and the cost of a Fable 5-quality result drops by 35% (and 41% for a pure Fable 5 comparison). Cognition reports 88% of merged PRs internally were driven end-to-end by the Fusion router.

Why Conventional Model Routing Breaks

The obvious way to save money on an agent is to switch models mid-conversation: run a cheap model for boilerplate, swap in an expensive one for the hard step, swap back. Every team that has tried this in production has hit the same three problems.

  • Cache invalidation. Every time you swap the model, the prompt cache the previous model built up is worthless. The next turn re-pays the full input token cost. For long agent sessions with 100k+ tokens of context, this alone can wipe out the savings from the cheaper model.
  • Late-binding difficulty. Engineering tasks reveal their difficulty late. A refactor that looked mechanical turns out to need a design decision on step 12. If you already committed to the cheap model, you cannot recover without swapping — and you pay the cache tax again.
  • Style discontinuities. Two models rarely write code with the same conventions. A patch that starts in one voice and ends in another is the tell of a mid-session swap, and human reviewers pick it up as a review-quality signal against you.

Fusion's answer is not to route better. It is to stop routing turn by turn and start co-locating models inside the same session.

The Architecture: Frontier + Sidekick, Running in Parallel

The Fusion harness runs two agents at the same time on the same session state. One agent uses the frontier model (Fable 5 or an equivalent Opus/GPT-5.5 class model). The other uses a cheaper sidekick model. The frontier agent owns the plan and the review loop. The sidekick agent gets bounded, well-specified subtasks it can complete without ambiguity.

  1. 1The frontier plans. Reads the repo, decides what to change, writes the outline. This turn stays with the frontier because the plan is where most of the intelligence needs to live.
  2. 2The frontier delegates a subtask. "Add a null check here," "rename this symbol across these files," "write the unit test for this branch." The subtask is small and closed enough that a cheaper model can finish it correctly.
  3. 3The sidekick runs it. In parallel. The sidekick has its own smaller context, does not touch the frontier's cache, and returns a patch or a result.
  4. 4The frontier reviews. If the sidekick's output passes, the frontier accepts it and moves on. If it fails, the frontier does the task itself.
  5. 5Context-compaction moments are the routing signal. When the session hits a compaction point — where the context has to be summarized down anyway — switching cost is close to zero. Fusion uses those moments to shift work back toward the smarter model without paying the usual penalty.

What the Numbers Actually Say

ConfigurationQuality baselineCost vs pure frontier
Fusion with Fable 5 as frontierFable 5-level FrontierCode score−41% cheaper than pure Fable 5
Fusion with Opus / GPT-5.5-class frontierSame class quality−35% cheaper than pure frontier
Naive mid-chat model swapDegraded (style + review flags)Savings often erased by cache misses
Cognition internal usage88% of merged PRs driven end-to-endRouter decides without human intervention

The 88% number is the one worth sitting with. It says the Fusion router is good enough that Cognition's own engineers let it drive most of the code that ships to production, with human review happening at PR time rather than turn by turn. That is the shape of a harness that has crossed from "interesting benchmark" to "you can build on it."

The center of gravity in agent systems is moving from "pick the best model" to harness engineering. Fusion is the first harness where the routing decision lives inside the loop instead of at the top of it.

Why It Lands Now

Three shifts we have written about across the last six months set Fusion up. Anthropic shipped dynamic workflows with 1000 subagents, which normalized the idea that an agent session should spawn smaller agents rather than doing everything in one loop. Sakana shipped Fugu multi-model orchestration, which made model-choice-per-subtask a first-class API concern. And the executor-advisor pattern established the language for "smart planner, cheap doer." Fusion is what happens when those three ideas collapse into a single production harness.

The Build Pattern We Use at Halmob

For mobile development and n8n automation engagements where an AI agent writes or edits code, the shape we land on with a Fusion-style harness looks like this:

  1. 1One frontier model per session. Pick the smart model at session start based on repo size and task class. Do not swap it once the session opens — that is the whole point.
  2. 2One sidekick pool, sized to concurrency. The sidekick model runs a small worker pool. Bounded subtasks queue against the pool; the pool depth is a cost knob, not a model choice.
  3. 3Explicit subtask contracts. A subtask sent to the sidekick has a defined input, a defined output shape, and an acceptance test the frontier can run cheaply. If the sidekick cannot satisfy the acceptance test, the work bounces back to the frontier automatically.
  4. 4Compaction as the routing checkpoint. When the session hits a context-compaction event, use it as the moment to re-evaluate which model owns the next block. Compaction is going to invalidate the cache anyway, so the switching cost is already paid.
  5. 5Audit both models to one log. Every frontier decision and every sidekick patch lands in one session-scoped log. When a reviewer asks "who wrote this," the log answers "the sidekick, and the frontier accepted the patch at this step." That is the traceability a production harness needs.

Where Fusion Fits Halmob's Stack

Most of what we build at Halmob sits at the meeting point of mobile apps, n8n automation, and AI agent orchestration. Fusion lands squarely inside the orchestration side of that triangle. The n8n workflow is what triggers the agent session. The frontier + sidekick loop is what actually writes the code. The mobile team owns the acceptance criteria the frontier uses to review the sidekick's patches. For OpenClaw consultancy engagements, Fusion is the piece that lets us give a cost-per-PR number that a CFO will accept, rather than a "depends on how many tokens the model chews through" estimate.

The overlap with long-horizon agent swarms like Kimi K2.6 is worth naming. Swarms run many agents; Fusion runs two. Swarms are the pattern for "take on a huge task by fanning out;" Fusion is the pattern for "take on a normal task for less money." Both matter. Most teams will need Fusion first because most tasks are the normal ones.

The Limits Worth Knowing Before You Ship

  • Subtask boundedness is the failure mode. The sidekick works if the frontier can specify subtasks the sidekick can close without asking a design question. Repos with a lot of implicit convention are hard on the sidekick; the frontier ends up redoing the work.
  • Concurrency has a floor. Running two agents in parallel doubles peak inference throughput. If your inference provider throttles per-account, plan the pool depth around the throttle, not the wallet.
  • Review latency is the cheap-model quality gate. The frontier reviewing sidekick output is only worth it if the review is cheaper than doing the task. Trivial subtasks with expensive review erase the savings.
  • Cache warmth is a runtime property. A session that gets interrupted and resumed cold loses the cache advantage Fusion is engineered around. Long-running agents need a session-warmup step, not just a session-restart step.
  • Two-model observability is more work than one. Metrics need to attribute cost, latency, and outcome to the right model. If your existing observability is one-agent-per-run, expect a refactor to make Fusion legible.

Where This Replaces Older Stacks

What it used to takeWhat it takes nowWhere the savings show up
Single frontier model per sessionFrontier + sidekick in one harness35–41% lower cost at same quality
Manual model-swap heuristics in codeIn-loop routing at compaction momentsNo cache-miss tax on the swap
Human review at every turnFrontier reviews sidekick, human reviews the PR88% of PRs merged without turn-by-turn oversight (Cognition data)
One-off cost dashboards per projectPer-session frontier + sidekick attributionCost-per-PR that a finance team can plan against

When to Reach for a Fusion-Style Harness and When Not To

A Simple Decision Rule

Reach for a Fusion-style harness when the workload is long-running agent sessions with a mix of hard and easy steps — coding agents, long research passes, multi-file refactors. Skip it when the task is short, uniform, or already runs on a cheap model at good quality. The overhead of running two agents only pays back over sessions long enough to accumulate cache value.

The Bottom Line

Devin Fusion is the first hybrid-model harness we would build on without wrapping in a proof-of-concept caveat. The 35% cost number is real; the 88% autonomous-merge number is more important. The pattern generalizes beyond Devin — frontier planner, cheap sidekick, review at compaction — and it is the harness shape most agentic coding tools will move toward in the next six months.

For the mobile and automation work we do at Halmob, this is the piece that turns "agentic coding is expensive" into "agentic coding is a line item." When we pair mobile development with n8n automation and AI agent orchestration, the frontier + sidekick shape lets us quote a per-PR cost the client can plan against instead of a token bill they cannot.

For sources, see the smol.ai AINews coverage of the June 29, 2026 issue, the Cognition Devin Fusion announcement, and the Cognition engineering blog for the follow-up posts on FrontierCode benchmark methodology.