The June 22, 2026 smol.ai newsletter led with one launch: Sakana AI shipped Fugu, a multi-agent orchestration model served as a single OpenAI-compatible API. You send one request to one endpoint, and Fugu decides which frontier model on the back end picks up the task, splits it into sub-tasks, checks the work, and synthesises the answer. For teams shipping mobile apps, n8n workflows, and AI agents, like the ones we build at Halmob, Fugu turns a question we have been answering for every client — "which model do we pick?" — into a routing problem the orchestration layer handles for us.
Fugu is the productised follow-up to Sakana's earlier Conductor research, which we covered when it was still a research preview. The new piece is that it is now one endpoint, two tiers, and a price list. The agent pool is swappable, the routing is learned rather than hand-written, and the verification step is part of the model rather than an afterthought bolted onto a prompt.
The 30-Second Version
What Sakana Fugu Actually Is
Fugu is a language model that has been trained to call other language models. The same way GPT and Claude have been trained to call tools, Fugu has been trained to call frontier models as if they were tools. The pool is swappable, which means Sakana can drop a new SOTA model into the back end without breaking your client. The router is the learned part — Fugu picks the model per sub-task based on what it has learned about cost, latency, and quality, not on a rules engine.
The research that backs the product is two ICLR 2026 papers: TRINITY and the original Conductor. TRINITY adds the Thinker / Worker / Verifier role split — the orchestrator decides who plans, who executes, and who checks the result before it ships. Conductor is the routing-as-RL paper that taught the orchestrator which model to pick. Fugu folds both ideas into a single OpenAI-compatible API surface.
How It Differs From the Conductor We Covered Earlier
We wrote about the Sakana Conductor research preview back in April. Conductor was a routing model. Fugu is the productised, multi-agent system you can actually point a client at. The shape of the change matters for anyone choosing an orchestration layer today.
| Aspect | Sakana Conductor (April 2026) | Sakana Fugu (June 2026) |
|---|---|---|
| Form factor | Research preview, paper-first | Productised, OpenAI-compatible API |
| What it does | Routes a task to one specialist model | Routes, delegates, verifies, synthesises |
| Tiers | One profile | Fugu and Fugu Ultra |
| Underlying research | Conductor RL routing | TRINITY plus Conductor |
| Drop-in shape | Client SDK proof of concept | Swap the base URL on your existing OpenAI client |
We walked through the Conductor research and the routing trade-offs in the Sakana Conductor write-up, and the broader orchestration-era shift in the orchestration era piece.
Why a Single OpenAI-Compatible Endpoint Matters for Mobile
Mobile apps already pay a tax for AI features that server-side apps do not. Every model you bind to the client is a model you have to keep on the secret list, rate-limit, fall back from, and update when a new SOTA lands. Adding a second provider doubles that work. Adding a third triples it. The cost of an orchestration layer is most of the reason mobile teams pick one model and stay with it for a year.
Fugu collapses that down to one endpoint. The phone holds one base URL and one key. The routing across Claude, GPT, Gemini, and the rest happens server-side, inside the model. When Sakana adds a new specialist to the pool, the phone gets the upgrade without an app-store release. That is the same property that makes a CDN useful — the client stops caring about the back end. It applies cleanly to the mobile development work we ship for clients, and it pairs with the loop discipline we wrote about in the loop engineering piece.
One endpoint, one key, a swappable agent pool on the other side.
Where Fugu Fits Inside an n8n Workflow
The n8n side is even shorter. The HTTP node already speaks OpenAI. Point the base URL at Fugu and the rest of the workflow does not change. The classifier node, the JSON validator, the human-approval branch, the persistence step — they all stay where they are. What changes is that the model node now talks to an orchestrator instead of a single model. A pricing question routes to a fast specialist, a long synthesis routes to Fugu Ultra, and the workflow does not have to know which one was chosen.
That is the shape we already use under the n8n automation work we ship. The production version of that pattern, with retries and queues, is documented in our n8n on ECS Fargate load test. Fugu lets the team that owns the workflow stop chasing the "best" model and let the orchestration model own that decision.
The Two Tiers and When to Use Each
| Tier | Tuned for | Use it when |
|---|---|---|
| Fugu | Low latency, everyday tasks | Chat replies, single-step tool calls, mobile UX |
| Fugu Ultra | Maximum accuracy, multi-step reasoning | Research synthesis, code review, hard agent loops |
Think of Fugu as a team of specialists with a strict deadline and Fugu Ultra as the same team with no deadline and a verifier in the room. Most mobile flows are Fugu jobs. Long-horizon agent runs, the kind we discussed in the Kimi K2.6 swarm write-up, are where Fugu Ultra earns its place — long enough that the extra verification cost is paid back by not having to re-run the loop.
Risks and Pitfalls Worth Designing Around
- Routing opacity. Fugu does not tell you which model handled which sub-task on every request. For a regulated workload — finance, health, anything subject to data-residency rules — you need a route trace. Ask for it before you commit a workload.
- Cost surprises. Fugu Ultra calls several models per request. The bill is not the bill of one model. Set a per-request token cap on the workflow side, not just on the prompt side.
- Vendor lock at one layer up. You stop locking in to one model, but you start locking in to one orchestrator. Treat Fugu as the new abstraction boundary and keep the prompts portable.
- Latency tail. A request that fans out to three specialists has a tail latency that is the slowest specialist, not the fastest. Time-out budgets need to reflect that, especially on mobile.
- Verification is not a free pass. The Verifier role catches obvious failure but it does not catch domain-specific wrong answers. Keep the schema validation and the eval set in your own workflow.
- Side-channel data. When Sakana swaps a specialist in the pool, your prompt is now seeing a model whose data policy you did not pick. Read the per-specialist policy as part of the orchestration contract.
How It Slots Into the 2026 Agent Stack
Fugu is not a model layer and it is not a runtime layer. It is the orchestration layer, sold as one model. That makes it the easiest thing to slot in when you already have the rest of the stack working.
| Layer | Recent 2026 example | Where Fugu plugs in |
|---|---|---|
| Model | Claude Opus 4.8, GLM-5.2, Gemini 3.5 Flash | Sits inside Fugu's swappable pool |
| Application SDK | Vercel AI SDK 6, Claude Agent SDK | Point the base URL at Fugu, keep the SDK |
| Orchestration | Salesforce Agentforce, Hermes Workspace | Fugu is a competitor here, sold as a model |
| Runtime | Cloudflare Project Think, NVIDIA Project Arc | Still owns durable execution and resume |
| Distribution | Apple Messages for Business, iMessage | Unchanged — phone hits one endpoint |
We covered the orchestration row separately in the Hermes Workspace mobile orchestration piece and the routing-versus-doing split in the executor-advisor pattern write-up.
A Practical Wiring Plan
- 1Pick one workflow to migrate. Take the chat-style feature that currently calls a single model. The change is one base URL and one key.
- 2Set a budget per call. Decide a token and a wall-clock cap before the first request. Fugu Ultra requests fan out — the budget needs to reflect that.
- 3Log the route metadata. Capture whatever the response headers expose about which specialist ran. Without it, you cannot debug a regression.
- 4Mirror the workflow. Keep the old single-model path live behind a flag for a week. Compare cost, latency, and answer quality on the same traffic.
- 5Decide tier by feature, not by team. Use Fugu for mobile UI, Fugu Ultra for back-of-house research. Mixing tiers inside one product is normal.
- 6Re-check the loop. The orchestrator changed; the surrounding loop did not. Validate that retries, timeouts, and idempotency still hold.
- 7Write down the exit. If Sakana raises the price or changes the pool, your migration plan is to swap the base URL back. Keep that path one config flip away.
When to Use Fugu, When to Stay With One Model
How It Fits the Halmob Stack
Most of what we ship at Halmob lives at the meeting point of mobile apps, n8n automation, and AI agent orchestration. Fugu lands inside the third column. For a client whose mobile app already hits a chat endpoint, the migration is one base URL change and a budget rule. For a client whose n8n flow already routes by intent, Fugu replaces the routing node with a model that has learned the routing for them.
The win we see is not faster answers. It is the disappearance of a recurring decision. The team stops asking "which model do we pick for this feature" every sprint, and starts asking "is the orchestration tier right for this feature." That is a smaller question with a smaller blast radius.
The Bottom Line
Sakana Fugu is the orchestration layer sold as a model. The interesting thing is not the benchmark scores Sakana published next to Claude Fable 5 and GPT 5.5. The interesting thing is the shape: one OpenAI-compatible endpoint, two tiers, a swappable pool, a learned router, a verifier in the middle. That shape erases a class of work that mobile and automation teams used to do every sprint.
The right question for the next sprint is no longer "which model do we pick for the chat feature." It is "what does our cost and latency look like when the routing is on the other side of one endpoint." At Halmob we pair mobile development with n8n automation and AI agent orchestration so teams that want that question answered cleanly get a shorter answer.
For sources, see the smol.ai AINews newsletter coverage of the June 22, 2026 Fugu launch, the official Sakana Fugu announcement, and the MarkTechPost write-up on the swappable agent pool.