How Multi-Agent Orchestration Works

TL;DR

Via's orchestrator is a 7.0MB Go binary that decomposes natural language tasks into a DAG of phases, assigns specialized AI personas, spawns Claude Code agents with dynamically generated context, and captures learnings from every run. The key insight: decomposition and execution use different models, and each agent gets a custom CLAUDE.md tailored to its specific phase.

The Box-and-Arrow Problem

Every article about multi-agent AI shows the same diagram: boxes connected by arrows, "Agent A talks to Agent B," a handwave about coordination. The actual engineering — how tasks get decomposed, how agents get assigned, how failures cascade — rarely gets discussed.

I built an orchestrator that handles all of this. It's a Go binary called orchestrator that lives at ~/skills/orchestrator/, compiles to 7.0MB, and has processed hundreds of missions. This is how it actually works.

The Execution Flow

When you run orchestrator run "research authentication patterns and implement OAuth", here's what happens:

Orchestration Flow

Task InputUser prompt

DecomposerDAG + BFS

Phase SchedulerParallel execution

Model RouterLLM selection

ClaudeOpus / Sonnet / Haiku

GeminiFlash / Pro

Agent SpawnerPersona agents

Quality GatesValidation

Merged OutputFinal result

The diagram above shows the full pipeline, but let me walk through each stage.

Stage 1: Task Decomposition

The decomposer takes a natural language task description and breaks it into phases with dependencies. This is where the DAG (directed acyclic graph) gets built.

The decomposer itself is an LLM call. Claude analyzes the task and returns structured JSON that maps to this Go type:

internal/orchestrate/decompose.go

type Phase struct {
    ID           string   `json:"id"`
    Name         string   `json:"name"`
    Description  string   `json:"description"`
    Dependencies []string `json:"dependencies"`
    Status       string   `json:"status"`
    Result       string   `json:"result,omitempty"`
}

The key design decision: decomposition is a separate LLM call from execution. This means the decomposer can use a different (cheaper) model than the executor. In practice, decomposition goes to Claude Sonnet — good enough to reason about task structure — while execution phases route to whichever model fits the task type. Research goes to Gemini. Implementation goes to Sonnet. Architecture decisions go to Opus.

For the "research auth and implement OAuth" example, the decomposer might produce:

Phase 1: Research auth patterns     (no dependencies)
Phase 2: Design OAuth architecture  (depends on Phase 1)
Phase 3: Implement OAuth provider   (depends on Phase 2)
Phase 4: Write integration tests    (depends on Phase 3)

Stage 2: The Persona System

Via defines 10 specialized personas. Each persona is a bundle of expertise, behavioral traits, and learning capture markers.

When the selector runs, it matches the phase description against these personas using keyword classification. A "security review" phase matches the security-auditor persona. A "research" phase matches the researcher. This matching injects domain-specific traits directly into the agent's system prompt — things like "Think like an attacker" for security reviews or "Cite everything, cross-reference sources" for research.

The persona also determines which learning markers the agent uses. The security-auditor emits VULNERABILITY: tags. The performance-engineer emits BOTTLENECK: tags. These markers are how the learnings system captures knowledge from each run.

It's simple — keyword matching rather than semantic similarity — but it works surprisingly well for the current set of 10 personas. Semantic matching is on the roadmap.

Stage 3: Agent Spawning and Dynamic Context

This is the most critical part of the system. The orchestrator doesn't just "talk" to an agent. It generates a completely custom environment for each one.

For every phase, spawn.go generates a phase-specific CLAUDE.md file:

internal/agent/spawn.go

func (s *Spawner) GenerateContext(
    phase Phase, persona Persona, learnings []Learning,
) string {
    var sb strings.Builder

    // Inject persona traits
    sb.WriteString(fmt.Sprintf("## Role: %s\n", persona.Name))
    sb.WriteString(persona.Description + "\n\n")

    // Inject relevant learnings from the DB
    sb.WriteString("## Prior Learnings\n")
    for _, l := range learnings {
        sb.WriteString(fmt.Sprintf("- %s: %s\n", l.Type, l.Content))
    }

    return sb.String()
}

This dynamic context generation is what makes the orchestrator more than a simple task runner. Each agent receives:

Persona traits — domain expertise and behavioral guidelines
Relevant learnings — past insights filtered by hybrid search (FTS5 + semantic embeddings)
Phase-specific instructions — what this particular phase needs to accomplish
Output format — where to write results, what markers to use

The agent doesn't need to know about the entire history of the project. It gets exactly the context it needs — and only the context it needs. This keeps token costs low and signal-to-noise ratio high.

The spawner then launches a Claude Code subprocess pointed at a temporary workspace directory containing this generated CLAUDE.md. The agent runs autonomously, writes its output, and exits.

Stage 4: Parallel Execution

The scheduler uses BFS (breadth-first search) dependency resolution to identify which phases can run in parallel.

Using the OAuth example from earlier: Phase 1 (Research) has no dependencies, so it starts immediately. Phase 2 (Design) depends on Phase 1, so it waits. But if the decomposer had produced two independent research phases — say "Research OAuth providers" and "Audit current auth code" — they would run simultaneously.

This matters most when routing to different models. Research phases go to Gemini (free tier, 1M context), so running three research phases in parallel has zero impact on Claude's rate limits and finishes in the time it takes the slowest one to complete.

Stage 5: Quality Gates and Learning Capture

After each phase completes, two things happen:

Quality gates validate the output. Did the agent produce the expected deliverable? Did it write to the correct output path? A simple existence check gates can prevent cascading failures — if a research phase produces no output, the dependent implementation phase shouldn't start.

Learning capture parses the agent's output for structured markers (LEARNING:, GOTCHA:, DECISION:, FINDING:), deduplicates them against the existing learnings database, and stores novel insights. This is the feedback loop that makes each mission smarter than the last. I wrote an entire article about how this works.

The Honest Limitations

The orchestrator works, but it's not without rough edges:

Keyword-based persona selection misses nuance. A phase described as "investigate the performance of the auth system" might get assigned the researcher persona when performance-engineer would be better. Semantic matching would fix this.

Context handoff between phases is file-based. Phase 1 writes a markdown file, Phase 2 reads it. This works but loses nuance — there's no way for Phase 2 to ask Phase 1 a clarifying question. It's batch processing, not conversation.

No retry logic for flaky failures. If an API rate limit hits mid-phase, the phase fails. The orchestrator doesn't retry — it marks the phase as failed and stops. Manual intervention is required.

These are known limitations, and several are captured as meta_gap entries in the learnings database. The system is literally documenting its own shortcomings, which is a feature I didn't design but emerged naturally from the learning capture system.

Next in series: Teaching AI to Learn From Its Mistakes

Why I Built a Multi-LLM Orchestration System — The Claude Swarm predecessor that proved multi-LLM orchestration works
Building EA: Architecture Decisions for a Production AI Assistant — Similar architectural patterns applied to a production AI product