Artificial Intelligence

Sakana Conductor Explained: 7B RL Agent Orchestrator

Sakana's 7B Conductor learns by RL to orchestrate a pool of frontier agents, choosing which model handles each subtask. Practical guide for your stack.

İlker Ulusoy 2026-04-29 8 min min read

Sakana AI shipped Conductor in late April 2026, a 7B parameter model trained with reinforcement learning to orchestrate a pool of frontier agents in plain language. Instead of solving tasks itself, Conductor decides which agent to call, what subtask to assign, and which context to expose. This guide walks through why a small router beats a big monolith, what the benchmarks say, and how to plug the pattern into your automation and mobile agent stack.

Most production AI systems still hand a whole task to one big model and hope it follows the plan. Sakana's Conductor takes the opposite bet. A small 7B router learns by reinforcement learning to delegate, and a fleet of frontier agents (Claude, GPT, Gemini, open-weight specialists) becomes its execution layer. The numbers are striking: 83.9% on LiveCodeBench and 87.5% on GPQA-Diamond, beating any single worker in the pool.

The 30-Second Version

Conductor is a 7B reinforcement-learning router that calls frontier agents per subtask. It picks the agent, writes the prompt, and shapes the context, and it outperforms running any single frontier model on its own. The unit of work shifts from one model to a small router plus a plural worker pool.

What Conductor Actually Does

Conductor is not the agent that does the work. It is the agent that decides who does the work. Every incoming request runs through the same loop:

  • Split the request into subtasks the way a senior engineer would.
  • Pick the right agent from the pool for each subtask, by capability and cost.
  • Write a sharp per-subtask prompt and pass only the context that agent needs.
  • Collect results, judge them, and decide to redo, escalate, or ship.

That last step is the one most pipelines get wrong. Conductor does not just dispatch and hope. The reinforcement-learning training signal rewards getting the final answer right, so the router learns when a cheap agent is enough and when to escalate to a frontier model.

Why a Small Router Wins

At 7B parameters, Conductor is roughly 1% of the size of the largest frontier models, and that is the point. A small dispatcher unlocks three things that a monolithic agent cannot:

Lower cost per task

Most subtasks do not need a frontier model. Reformatting JSON, picking a tool, or summarizing a paragraph can run on a cheap worker. Routing only the hard subtasks to expensive models cuts spend without losing quality on the result.

Lower latency on the decision loop

A 7B model fits on a single GPU. Routing decisions land in well under 100 ms, so the orchestrator does not become the bottleneck. The slowest part of the system is the worker that actually does the work, which is how it should be.

Specialization beats generalization

A coding agent for code, a search agent for retrieval, a math agent for proofs. Conductor learns which worker is good at what, and routes accordingly. That is closer to how real teams work than asking one senior to do everything.

The shift is from "one model to rule them all" to "a small router and a plural worker pool". The router is the product. The workers are interchangeable.

The Benchmark Picture

Sakana published numbers across three reasoning-heavy benchmarks. The important point is not the absolute scores. It is that the routed pool beats any single worker in the same pool, end-to-end:

BenchmarkBest single workerConductor (routed pool)Delta
LiveCodeBench78.4%83.9%+5.5
GPQA-Diamond82.1%87.5%+5.4
AIME-2026 (math)71.0%76.2%+5.2

A consistent five-point lift across very different domains is the signal. The router is learning a transferable skill, not memorizing a benchmark.

How This Changes Your Automation Stack

If you already build with n8n, Claude, or open-weight models, the practical change is small but real. The orchestration layer is no longer a static if-else over model names. It is itself a learned component. We covered the broader trend in the orchestration era of agentic coding, and Conductor is the cleanest open argument for it yet.

The rebuild is straightforward in concept:

  • Replace single-LLM nodes in your n8n flows with a router call that returns the chosen worker plus its prompt.
  • Keep your existing tools and skills. The router does not change them, it just decides who uses them.
  • Track per-subtask cost and success in one log line, so the router has a feedback signal you can later train against.

A Mobile-First Version of This Pattern

The 7B size is not an accident. A model that small can run on a workstation today, and on the next generation of on-device inference tiers tomorrow. That makes a Conductor-style router the natural fit for the mobile pattern we wrote about in Hermes Workspace Mobile and agent orchestration on a phone: the device holds the router and the approval surface, while the heavy workers stay in the cloud.

Why This Matters for Mobile Automation

A small router on the device gives you private routing decisions, instant approvals, and a much shorter loop for the 80% of subtasks that never need a frontier model. The phone stops being a dumb chat client and starts being the orchestrator.

Five Practical Use Cases

n8n flows that pick the cheapest model that works

Replace your fixed model node with a router call. The router picks Gemini Flash for the easy 90%, Claude or GPT-5 for the rest, and the flow stays unchanged.

Mobile assistant with on-device dispatch

The phone runs the router locally. Quick reformat or classification stays on-device. Hard reasoning calls a cloud worker, with biometric approval for anything that costs money or sends a message.

SaaS agent products with bounded margins

If your product calls a frontier model on every request, your gross margin is at the mercy of a vendor price list. A learned router moves most traffic to cheaper workers and keeps quality steady, so margin becomes a thing you control.

Code review and refactors

Conductor-style routing per file, per language, or per failing test. A specialized agent reviews the SwiftUI diff, another reviews the Terraform diff, another writes the migration plan. The same router connects them.

Customer support triage

The router decides between a knowledge-base lookup, an LLM draft, and a human escalation. It learns over time which channel resolves which ticket type, and the cost per ticket falls without a quality drop.

How to Get Started

  1. 1Read Sakana's public Conductor write-up to understand the training setup and the API surface.
  2. 2Pick one workflow you already run that calls a single LLM. Wrap that call behind a router stub that returns "the same model" for now.
  3. 3Add a second worker (a cheaper model) to the pool. Let the router log which subtasks each worker handled and how it went.
  4. 4Once you have a few weeks of logs, train or rule-tune the router so that easy subtasks land on the cheap worker by default.

If you are still mapping out the agent space, our OpenClaw 101 guide for new users explains the building blocks of an agent stack: tools, skills, permissions, memory. Conductor sits one layer above all of those, but only makes sense once those pieces are in place.


The Bottom Line

The interesting frontier of 2026 is not a bigger single model. It is better orchestration of the models we already have. Sakana's Conductor is the cleanest demonstration so far that a small, learned router beats running any single frontier model on its own, and it does so cheaply enough to fit a mobile and edge story.

The question to take into your next sprint is simple. Where in your stack are you still paying frontier prices for subtasks a smaller worker would have handled, and what would change if a router made that call for you?