OpenAI Symphony: Open-Source Codex Orchestration Spec

On April 27, 2026 OpenAI open-sourced Symphony, a spec that turns a project-management board like Linear into a control plane for Codex coding agents. Every open ticket gets its own agent, the agent runs until a pull request is ready, and the team only steps in to review. This guide unpacks what the spec actually defines, how the Elixir reference implementation works, and where it pays off in your own stack.

Most agent frameworks today still feel like fancy chat. Symphony argues something stronger: the unit of work is not a prompt, it is a ticket. If you can describe a ticket clearly, an agent can drive the whole loop from Todo to merged PR, and the team can spend its day reviewing instead of nudging a chat window. OpenAI says this shift produced a 500% increase in landed PRs on some internal teams in the first three weeks.

The 30-Second Version

Symphony is an Apache-2.0 spec (SPEC.md) plus an Elixir/BEAM reference implementation. Linear is the state machine, Codex is the worker, GitHub PRs are the output. Tickets move Todo → In Progress → Review → Merging while a supervisor restarts any agent that crashes.

What Symphony Actually Is

Symphony is not a SaaS product and OpenAI is not planning to maintain it as one. It is two things in a public repo:

SPEC.md, a single Markdown file describing how a service that orchestrates coding agents should behave: how to read from an issue tracker, how to drive an agent through retries and back-off, how to take a task all the way to a pull request.
A reference implementation in Elixir, running on the BEAM VM, that demonstrates the spec end-to-end against Linear, GitHub and Codex.

The authors (Alex Kotliarskyi, Victor Zhu and Zach Brock) explicitly treat Symphony as a reference, not a platform. The expected workflow is to read the spec, fork it, and rebuild it inside your own stack.

Among some teams at OpenAI, the number of landed PRs increased by 500% in the first three weeks. When engineers stop supervising Codex sessions, the perceived cost of each change drops, and the economics of code changes shift completely.

The Linear Board as a State Machine

The single most important design choice in Symphony is treating the Linear board as a finite state machine. Every ticket has a status, and the orchestrator cares only about that status:

Status	What Symphony does	Who acts
Todo	Spawns a fresh agent workspace for the ticket	Symphony
In Progress	Agent investigates, edits, runs tests, opens a draft PR	Codex agent
Review	Marks the PR ready for review and pings reviewers	Human reviewers
Merging	Lands the PR, closes the ticket, tears down the workspace	Symphony + GitHub

Because the source of truth is the board, the agent is stateless between runs. If the orchestrator dies, it can read the board again on restart and reconstruct exactly what should be happening. That is the same property that makes a job queue robust, applied to coding agents.

Why Elixir and the BEAM

The reference implementation is in Elixir, which is unusual for an OpenAI release. It is also the right choice for this problem.

Supervisor trees fit the agent lifecycle

BEAM/OTP gives you a supervisor pattern out of the box: every coding agent runs as a child process under a supervisor, and a crash (out of memory, API timeout, an unexpected error mid-run) is caught and the agent is restarted with a clean slate. This is exactly the property a long-running orchestrator needs and exactly the place where most Python frameworks fall over.

Concurrency is cheap

BEAM processes are cheap. Each ticket can have its own process, with its own mailbox, without your machine flinching. This makes the "every open ticket gets an agent" promise affordable rather than aspirational.

The spec was hardened by porting it

OpenAI pushed Codex itself to implement the spec in TypeScript, Go, Rust, Java and Python. Every ambiguity that surfaced during those ports got pushed back into SPEC.md. The result is a tighter, less language-specific document than a single-implementation spec usually gets.

The Agent Loop, in One Page

Inside an agent process, Symphony runs an opinionated loop. It is simple enough to write down in five steps:

1Pick up the ticket and clone an isolated workspace for it.
2Have Codex propose a plan, run it, run the tests, and iterate until they pass or a budget is exhausted.
3Open a draft PR, push commits, and post the diff back to the Linear ticket.
4Move the ticket to Review and wait for human feedback.
5Apply review comments via Codex, re-run tests, and either ship or escalate.

The loop is intentionally boring. Most of the engineering effort in Symphony is in the edges: retries, back-off, timeouts, isolation, and the supervisor that respawns agents when things go wrong. That is where chat-style agents typically fail in production.

How This Differs from a Chat-First Agent

It is worth being explicit about the contrast, because it explains why Symphony works at all.

Chat-first agents assume a human is watching. The user provides context, redirects when stuck, and decides when work is done.
Symphony agents assume the human is busy elsewhere. The ticket carries the context, the test suite carries the done condition, and the reviewer is invoked only when there is something to review.

This is the same shift we wrote about in the orchestration era of agentic coding. Symphony is the cleanest open spec that argues the case end-to-end. It also pairs naturally with router-style designs like the one in the Sakana Conductor multi-agent orchestration guide: Symphony decides which ticket to work on, a router decides which worker handles each subtask inside that ticket.

Where It Pays Off (and Where It Does Not)

It pays off when

Your team already writes Linear tickets at a level of detail an agent could act on. Vague tickets stay vague.
You have a reasonable test suite. Without one, the agent has no signal that the work is finished.
You ship many small, similar PRs (renames, dependency bumps, copy edits, well-scoped bug fixes).

It does not pay off when

Most of your engineering value lives in design conversations and architectural decisions, not in PRs.
Your codebase has weak test coverage and merging without review is risky regardless of who wrote the diff.
You cannot afford the review load: an autonomous agent can produce PRs faster than a small team can read them.

The hidden cost: review bandwidth

A 5x increase in PRs is a 5x increase in review work. Symphony moves the bottleneck from writing code to reviewing it. Plan for that before you turn it on, or your reviewers will quietly become the slowest part of the pipeline.

How to Try It Without Betting the Farm

1Read SPEC.md end-to-end. It is short on purpose.
2Stand up the Elixir reference implementation against a sandbox Linear project and a single non-critical repo.
3Pick one ticket type that is high volume and well-tested (dependency bumps are a good first target). Let Symphony drive it.
4Measure two things: time-to-PR and reviewer hours. If both move in your favor, expand the ticket types one at a time.
5Only then think about porting the spec to your stack's language. Forking the Elixir version is fine for many teams.

If you are still mapping the wider landscape of coding agents, our OpenClaw 101 guide for new users covers the building blocks (tools, skills, permissions, memory) that sit one layer below Symphony.

The Bottom Line

Symphony is small in code and large in implications. By treating Linear as the state machine, Codex as the worker, and a BEAM supervisor as the safety net, OpenAI has shown that orchestration, not prompts, is the part of agent engineering that actually scales. Whether or not you adopt this exact spec, the design is worth copying: a board you already trust, an agent you can restart, and a review step that stays in human hands. That is the path from a clever demo to a coding agent your team can actually rely on.