Sakana Fugu: Multi-Agent Orchestration Behind One API

The June 22, 2026 smol.ai newsletter led with one launch: Sakana AI shipped Fugu, a multi-agent orchestration model served as a single OpenAI-compatible API. You send one request to one endpoint, and Fugu decides which frontier model on the back end picks up the task, splits it into sub-tasks, checks the work, and synthesises the answer. For teams shipping mobile apps, n8n workflows, and AI agents, like the ones we build at Halmob, Fugu turns a question we have been answering for every client — "which model do we pick?" — into a routing problem the orchestration layer handles for us.

Fugu is the productised follow-up to Sakana's earlier Conductor research, which we covered when it was still a research preview. The new piece is that it is now one endpoint, two tiers, and a price list. The agent pool is swappable, the routing is learned rather than hand-written, and the verification step is part of the model rather than an afterthought bolted onto a prompt.

The 30-Second Version

Sakana Fugu is a multi-agent orchestration model behind one OpenAI-compatible API. It reads your task, decides whether to answer directly or break it into sub-tasks, picks a specialist model for each step from a swappable pool of frontier LLMs, verifies the result, and synthesises the final answer. You wire it in the same way you wire any chat completion — the orchestration sits on the other side of the call.

What Sakana Fugu Actually Is

Fugu is a language model that has been trained to call other language models. The same way GPT and Claude have been trained to call tools, Fugu has been trained to call frontier models as if they were tools. The pool is swappable, which means Sakana can drop a new SOTA model into the back end without breaking your client. The router is the learned part — Fugu picks the model per sub-task based on what it has learned about cost, latency, and quality, not on a rules engine.

The research that backs the product is two ICLR 2026 papers: TRINITY and the original Conductor. TRINITY adds the Thinker / Worker / Verifier role split — the orchestrator decides who plans, who executes, and who checks the result before it ships. Conductor is the routing-as-RL paper that taught the orchestrator which model to pick. Fugu folds both ideas into a single OpenAI-compatible API surface.

How It Differs From the Conductor We Covered Earlier

We wrote about the Sakana Conductor research preview back in April. Conductor was a routing model. Fugu is the productised, multi-agent system you can actually point a client at. The shape of the change matters for anyone choosing an orchestration layer today.

Aspect	Sakana Conductor (April 2026)	Sakana Fugu (June 2026)
Form factor	Research preview, paper-first	Productised, OpenAI-compatible API
What it does	Routes a task to one specialist model	Routes, delegates, verifies, synthesises
Tiers	One profile	Fugu and Fugu Ultra
Underlying research	Conductor RL routing	TRINITY plus Conductor
Drop-in shape	Client SDK proof of concept	Swap the base URL on your existing OpenAI client

We walked through the Conductor research and the routing trade-offs in the Sakana Conductor write-up, and the broader orchestration-era shift in the orchestration era piece.

Why a Single OpenAI-Compatible Endpoint Matters for Mobile

Mobile apps already pay a tax for AI features that server-side apps do not. Every model you bind to the client is a model you have to keep on the secret list, rate-limit, fall back from, and update when a new SOTA lands. Adding a second provider doubles that work. Adding a third triples it. The cost of an orchestration layer is most of the reason mobile teams pick one model and stay with it for a year.

Fugu collapses that down to one endpoint. The phone holds one base URL and one key. The routing across Claude, GPT, Gemini, and the rest happens server-side, inside the model. When Sakana adds a new specialist to the pool, the phone gets the upgrade without an app-store release. That is the same property that makes a CDN useful — the client stops caring about the back end. It applies cleanly to the mobile development work we ship for clients, and it pairs with the loop discipline we wrote about in the loop engineering piece.

One endpoint, one key, a swappable agent pool on the other side.

Where Fugu Fits Inside an n8n Workflow

The n8n side is even shorter. The HTTP node already speaks OpenAI. Point the base URL at Fugu and the rest of the workflow does not change. The classifier node, the JSON validator, the human-approval branch, the persistence step — they all stay where they are. What changes is that the model node now talks to an orchestrator instead of a single model. A pricing question routes to a fast specialist, a long synthesis routes to Fugu Ultra, and the workflow does not have to know which one was chosen.

That is the shape we already use under the n8n automation work we ship. The production version of that pattern, with retries and queues, is documented in our n8n on ECS Fargate load test. Fugu lets the team that owns the workflow stop chasing the "best" model and let the orchestration model own that decision.

The Two Tiers and When to Use Each

Tier	Tuned for	Use it when
Fugu	Low latency, everyday tasks	Chat replies, single-step tool calls, mobile UX
Fugu Ultra	Maximum accuracy, multi-step reasoning	Research synthesis, code review, hard agent loops

Think of Fugu as a team of specialists with a strict deadline and Fugu Ultra as the same team with no deadline and a verifier in the room. Most mobile flows are Fugu jobs. Long-horizon agent runs, the kind we discussed in the Kimi K2.6 swarm write-up, are where Fugu Ultra earns its place — long enough that the extra verification cost is paid back by not having to re-run the loop.

Risks and Pitfalls Worth Designing Around

Routing opacity. Fugu does not tell you which model handled which sub-task on every request. For a regulated workload — finance, health, anything subject to data-residency rules — you need a route trace. Ask for it before you commit a workload.
Cost surprises. Fugu Ultra calls several models per request. The bill is not the bill of one model. Set a per-request token cap on the workflow side, not just on the prompt side.
Vendor lock at one layer up. You stop locking in to one model, but you start locking in to one orchestrator. Treat Fugu as the new abstraction boundary and keep the prompts portable.
Latency tail. A request that fans out to three specialists has a tail latency that is the slowest specialist, not the fastest. Time-out budgets need to reflect that, especially on mobile.
Verification is not a free pass. The Verifier role catches obvious failure but it does not catch domain-specific wrong answers. Keep the schema validation and the eval set in your own workflow.
Side-channel data. When Sakana swaps a specialist in the pool, your prompt is now seeing a model whose data policy you did not pick. Read the per-specialist policy as part of the orchestration contract.

How It Slots Into the 2026 Agent Stack

Fugu is not a model layer and it is not a runtime layer. It is the orchestration layer, sold as one model. That makes it the easiest thing to slot in when you already have the rest of the stack working.

Layer	Recent 2026 example	Where Fugu plugs in
Model	Claude Opus 4.8, GLM-5.2, Gemini 3.5 Flash	Sits inside Fugu's swappable pool
Application SDK	Vercel AI SDK 6, Claude Agent SDK	Point the base URL at Fugu, keep the SDK
Orchestration	Salesforce Agentforce, Hermes Workspace	Fugu is a competitor here, sold as a model
Runtime	Cloudflare Project Think, NVIDIA Project Arc	Still owns durable execution and resume
Distribution	Apple Messages for Business, iMessage	Unchanged — phone hits one endpoint

We covered the orchestration row separately in the Hermes Workspace mobile orchestration piece and the routing-versus-doing split in the executor-advisor pattern write-up.

A Practical Wiring Plan

1Pick one workflow to migrate. Take the chat-style feature that currently calls a single model. The change is one base URL and one key.
2Set a budget per call. Decide a token and a wall-clock cap before the first request. Fugu Ultra requests fan out — the budget needs to reflect that.
3Log the route metadata. Capture whatever the response headers expose about which specialist ran. Without it, you cannot debug a regression.
4Mirror the workflow. Keep the old single-model path live behind a flag for a week. Compare cost, latency, and answer quality on the same traffic.
5Decide tier by feature, not by team. Use Fugu for mobile UI, Fugu Ultra for back-of-house research. Mixing tiers inside one product is normal.
6Re-check the loop. The orchestrator changed; the surrounding loop did not. Validate that retries, timeouts, and idempotency still hold.
7Write down the exit. If Sakana raises the price or changes the pool, your migration plan is to swap the base URL back. Keep that path one config flip away.

When to Use Fugu, When to Stay With One Model

Reach for Fugu when your product has more than one model job — chat, code, planning, vision — and the team is spending time on per-model routing. Stay with a single model when latency is the only constraint and you have already tuned the prompt to one provider. The decision is whether the orchestration cost has become a line on your sprint board.

How It Fits the Halmob Stack

Most of what we ship at Halmob lives at the meeting point of mobile apps, n8n automation, and AI agent orchestration. Fugu lands inside the third column. For a client whose mobile app already hits a chat endpoint, the migration is one base URL change and a budget rule. For a client whose n8n flow already routes by intent, Fugu replaces the routing node with a model that has learned the routing for them.

The win we see is not faster answers. It is the disappearance of a recurring decision. The team stops asking "which model do we pick for this feature" every sprint, and starts asking "is the orchestration tier right for this feature." That is a smaller question with a smaller blast radius.

The Bottom Line

Sakana Fugu is the orchestration layer sold as a model. The interesting thing is not the benchmark scores Sakana published next to Claude Fable 5 and GPT 5.5. The interesting thing is the shape: one OpenAI-compatible endpoint, two tiers, a swappable pool, a learned router, a verifier in the middle. That shape erases a class of work that mobile and automation teams used to do every sprint.

The right question for the next sprint is no longer "which model do we pick for the chat feature." It is "what does our cost and latency look like when the routing is on the other side of one endpoint." At Halmob we pair mobile development with n8n automation and AI agent orchestration so teams that want that question answered cleanly get a shorter answer.

For sources, see the smol.ai AINews newsletter coverage of the June 22, 2026 Fugu launch, the official Sakana Fugu announcement, and the MarkTechPost write-up on the swappable agent pool.

Sakana Fugu: Multi-Agent Orchestration Behind One API

The 30-Second Version

What Sakana Fugu Actually Is

How It Differs From the Conductor We Covered Earlier

Why a Single OpenAI-Compatible Endpoint Matters for Mobile

Where Fugu Fits Inside an n8n Workflow

The Two Tiers and When to Use Each

Risks and Pitfalls Worth Designing Around

How It Slots Into the 2026 Agent Stack

A Practical Wiring Plan

When to Use Fugu, When to Stay With One Model

How It Fits the Halmob Stack

The Bottom Line

Related Articles

Sakana Fugu: Multi-Model AI Orchestration as One API

Minitap Mobile-Use: 100% AndroidWorld Mobile AI Agents

Loop Engineering: AI Agent Loops That Survive Failure

Cognizant Neuro AI + ServiceNow Multi-Agent Orchestration

Apple Poke: First AI Agent on iMessage for Business

Microsoft Scout: Always-On AI Agent for M365 Explained