On June 22, 2026 Sakana AI shipped Fugu in general availability: a single OpenAI-compatible API that hides a learned multi-agent orchestration layer behind one model call. The latest smol.ai AINews issue named it as the week's standout because Fugu is not a router built from if/else rules — the orchestration itself is a trained language model that decides which frontier agent to call, how to communicate with it, and how to merge the answers. For teams shipping mobile apps, n8n automation, and AI agent orchestration like the work we do at Halmob, Fugu is the first production-grade answer to the question every team has been asking: which model do we pick.
The release matters for a specific reason. Most production agents today still hard-code a single model in the prompt handler. Fugu inverts that: the model choice becomes runtime state owned by a learned orchestrator, the application keeps one OpenAI-shaped endpoint, and the underlying pool of frontier models becomes swappable. Vendor lock-in stops being a procurement debate and becomes a configuration line.
The 30-Second Version
What Sakana Fugu Actually Is
Fugu is the productisation of two ICLR 2026 papers from Sakana AI: TRINITY (an evolved LLM coordinator) and the Conductor (learning to orchestrate agents in natural language). The research version of the Conductor was a 7B reinforcement-learning router we covered earlier in our Sakana Conductor write-up. Fugu is the same idea taken from a benchmark paper to a production API with billing, tiers, and an OpenAI-compatible surface.
The product ships in two sizes. The base Fugu model is the balanced default — strong performance, low latency, suitable for everyday coding, chat, and review. Fugu Ultra targets the harder end of the spectrum: Kaggle-style competitions, security assessments, multi-hour research runs, anything that benefits from a wider pool of expert models running in parallel under one coordinator.
Why It Is Not Just Another Router
A normal router is a rules engine. You write if the task is code, send it to model X; if it is reasoning, send it to model Y. That works until the rules stop holding, which is usually within a quarter. Fugu replaces the rules with a trained language model whose entire job is to read the request, decide who should answer, draft the prompt for each agent, and combine the outputs into one coherent reply.
The practical consequence: when a new frontier model lands — Claude Opus 4.8, GLM-5.2, MiniMax M3 — you do not rewrite the routing layer. You add the model to the pool and the orchestrator learns to use it. That is the same idea behind the harness profiles pattern we wrote up in the LangChain Deep Agents piece, taken one step further: the harness itself is the model.
You call one endpoint. The right frontier model runs underneath. The orchestrator learns the rest.
How Fugu, the Conductor, and a Hand-Written Router Compare
| Approach | Where it lives | What changes when a new model lands |
|---|---|---|
| Hand-written router (if/else) | Inside your application code | Engineer rewrites the rules, ships a new release |
| Sakana Conductor (research) | A 7B RL-trained delegator you self-host | Add the model to the pool; retrain or fine-tune the router |
| Sakana Fugu (production) | A single OpenAI-compatible API call | Sakana adds the model to the pool; your app code stays the same |
| Multi-agent framework (LangGraph, Hermes) | An orchestration graph you author and host | You decide which node uses which model, manually |
The right row depends on how much control you need. A regulated workload with strict data-residency requirements still wants the self-hosted Conductor or a Hermes-style graph — covered in the Hermes Workspace mobile orchestration piece. A team that wants frontier performance behind one endpoint, with pricing they can model, picks Fugu.
Pricing and Availability
Fugu launched immediately in most regions, with the EU and EEA listed as a temporary exception at GA. Three subscription tiers cover most workloads, and pay-as-you-go pricing covers the rest.
| Tier | Price | Best for |
|---|---|---|
| Standard | $20 / month | Lightweight workflows, prototyping, a single developer |
| Pro | $100 / month | 10x Standard allowance — small teams, daily agent use |
| Max | $200 / month | 20x Standard allowance — continuous, long-running agents |
| Fugu Ultra (PAYG) | $5 / 1M input · $30 / 1M output | High-stakes runs that justify Ultra latency and cost |
The economics matter for orchestration teams in a specific way. If you build the routing layer yourself, you absorb the cost of every wrong model choice — a Claude Opus call for a task GLM could have served. Fugu prices that cost into the API and gives the orchestrator the incentive to pick the cheapest model that meets the bar. For long-running automation that calls the agent thousands of times a day, that swing is real money.
Why This Matters for Mobile and Automation Teams
Two patterns Halmob ships every week look different the day after Fugu lands. The first is a mobile app whose feature depends on an LLM call — a summariser, a categoriser, a smart compose surface. The second is an n8n workflow whose AI node has been pinned to one model for six months because changing it is a release-day risk. Both can move to a Fugu-style endpoint without rewriting downstream code, because the contract is the OpenAI Chat Completions shape they were already calling.
For mobile in particular, the win is that the model choice can shift without an app update. The endpoint is the contract. The app calls a server-side proxy, the proxy calls Fugu, and the model running under the hood is whichever Fugu picks today. That pairs naturally with the server-owned loop pattern we walked through in the loop engineering piece: the phone never knows which model answered, and the server never needs an app release to upgrade.
For n8n automation, Fugu drops in as a standard OpenAI-compatible credential. Existing AI nodes — Agent, Chat Model, Tools — accept it without code changes. The same workflow that used to call one model now gets the right model per step, decided by a trained coordinator rather than a hard-coded selection.
Where Fugu Fits in the 2026 Agent Stack
Fugu does not replace the rest of the stack. It collapses one layer of it — model selection — into the API call. The other layers remain exactly where they were.
| Layer | What it owns | Where Fugu sits |
|---|---|---|
| Application loop | Retries, schema validation, durable state | Above Fugu — you still engineer this loop |
| Orchestration / routing | Picking the model, drafting the sub-prompt, merging outputs | Replaced by Fugu |
| Model pool | The frontier models that actually run | Hidden inside Fugu — swappable by Sakana |
| Tools and MCP | External actions the agent can call | Stays in your harness, surfaced through the API |
| Distribution | Mobile app, webhook, chat surface, voice | Untouched — Fugu is just another endpoint |
That layout pairs cleanly with the orchestration pattern we covered in the Salesforce Agentforce write-up and the executor/advisor split in the executor-advisor pattern piece. Fugu is the model-selection node inside that bigger graph, not the graph itself.
The Failure Modes Worth Designing Around
- Opaque cost attribution. When the orchestrator picks the model, you do not get to choose. Treat the per-tier monthly cap as a budget, instrument calls, and alert when one workflow eats a disproportionate share.
- Latency variance across the pool. A request routed to a heavyweight model is slower than one routed to a light one. Set p95 latency targets in your loop, not just averages, and surface them in your dashboards.
- EU and EEA data residency. Fugu is not yet available there at GA. Regulated workloads in those regions need a self-hosted fallback — Conductor, a Hermes graph, or a per-region model.
- The orchestrator's training data lag. A brand-new frontier model will not be in the pool on day one. If your workload depends on a specific model's capability, pin to that model directly until Sakana adds it.
- Cross-step reasoning split across models. When step 1 runs on model A and step 2 on model B, context can leak or get summarised. Validate the schema on every tool call, the same discipline we covered in loop engineering.
- Lock-in shifted, not removed. You no longer pick the model, but you now depend on Sakana picking well. Treat Fugu the way you would treat any single vendor in the critical path: have a kill switch.
A Practical Migration Plan for Existing Workloads
- 1Pick the one workflow whose model choice is most debated. The agent your team has rewritten three times because the model keeps surprising you is the right starting point.
- 2Run Fugu in shadow mode. Send the request to both the existing model and Fugu, log both answers, do not show Fugu to the user yet. A week of shadow data shows you where it agrees and where it diverges.
- 3Promote one tier at a time. Move low-stakes flows first — internal summaries, draft generation, classification. Keep the user-facing money path on the pinned model until you have telemetry.
- 4Wire cost dashboards before you scale. Per-workflow, per-day, per-tier. A Fugu workflow that quietly drifts to Fugu Ultra during a regression is a budget surprise waiting to happen.
- 5Keep a single-model fallback in the harness. If Fugu degrades, your loop should swap back to a pinned model without a release. That is the same kill switch any vendor in the path deserves.
- 6Document the contract you depend on. "We call Fugu through the OpenAI-compatible Chat Completions API with these system messages." That sentence is the spec; if it ever breaks, you know what to rebuild.
When to Pick Fugu, When to Stay on a Pinned Model
How It Fits the Halmob Stack
Most of what we ship at Halmob is the layer above the model: a mobile app talking to an n8n workflow that drives an AI agent. Fugu makes that stack simpler on the day it is added and harder on the day it is removed, which is the right shape for a vendor decision. The argument for using it is that the application code stops caring about the model and starts caring about the loop — the work we covered in the loop engineering piece.
For iOS and Android teams the immediate move is to add a server-side proxy in front of the agent endpoint and put Fugu behind it. The phone keeps calling the proxy. The proxy decides whether the traffic flows to Fugu, a pinned model, or a Hermes-style graph. That single seam is what makes the next model swap a configuration change instead of an app release.
The Bottom Line
Sakana Fugu is the first production-grade attempt to make "which model do we pick" a runtime decision rather than a release decision. The orchestrator is learned, the API is OpenAI-compatible, and the pool of models is swappable behind the curtain. For mobile, n8n, and agent orchestration teams the practical effect is that the model choice stops being a source of release-day risk and becomes a line in a dashboard.
The right question for the next sprint is not "is Fugu better than Claude or GPT." It is "which of our workflows would benefit from a model that can change without an app update." At Halmob we pair mobile development with n8n automation and AI agent orchestration for teams that want the answer to that question to be most of them.
For sources, see the smol.ai AINews newsletter coverage of the June 22 release, the official Sakana Fugu release page, and the VentureBeat write-up on Fugu's frontier-performance claims.