Orchestration Patterns

8 proven patterns for coordinating multiple AI agents

Quick Reference

Pattern	Complexity	Agents	Best For
Swarm	Low	3-50+	Parallel independent tasks like batch classification
Pipeline	Low	2-6	Multi-stage processing like research → draft → edit → publish
Supervisor	Medium	3-8	Complex projects requiring multiple skills — like building a feature that needs research
Map-Reduce	Medium	3-20+	Large-scale data processing
Debate	Medium	3-5	Decision-making
Hierarchical	High	7-50+	Large-scale orchestration with many agents
Router	Low	3-10	Heterogeneous workloads where different tasks require different models
Reflection	Low	1-2	Quality-sensitive outputs like production code
Consensus	Medium	3-7	High-stakes decisions
Blackboard	High	3-10	Complex problem solving
Guardrail	Low	2	Safety-critical applications
Tool Augmented	Low	1	General-purpose automation

Swarm

Low3-50+ agents

Fan out identical tasks to N worker agents running in parallel, then merge all results into a single output.

How It Works

A coordinator splits a task into independent subtasks and dispatches them to worker agents simultaneously. Each worker processes its subtask independently. When all workers complete, results are collected and merged into a unified response.

Diagram

         ┌─ Worker 1 ─┐
Task ──► │  Worker 2  │ ──► Merge
         └─ Worker N ─┘

Best For

Parallel independent tasks like batch classification, multi-file analysis, bulk content generation, or processing large datasets.

Tradeoffs

High throughput but no inter-agent communication during execution. Workers cannot share context or build on each other's results. Resource usage scales linearly with worker count.

Example

Analyze 20 GitHub repos in parallel — each worker clones and reviews one repo, then results merge into a comparison report.

Frameworks

LangGraphCrewAIAutoGenMastraClaude Agent SDK

Pipeline

Low2-6 agents

Sequential handoff between specialized agents where each stage transforms or enriches the output before passing it forward.

How It Works

Agents are arranged in a chain. Agent 1 produces output that becomes Agent 2's input, and so on. Each agent specializes in one transformation step. The final agent produces the end result.

Diagram

Task ──► Agent 1 ──► Agent 2 ──► Agent 3 ──► Result
         research     draft       edit

Best For

Multi-stage processing like research → draft → edit → publish, or extract → transform → load data pipelines.

Tradeoffs

Clear separation of concerns and predictable flow, but total latency is the sum of all stages. A failure at any stage blocks downstream agents.

Example

Content pipeline: Researcher agent gathers sources, Writer agent drafts the article, Editor agent polishes prose and checks facts.

Frameworks

LangGraphCrewAIHaystackMastraVercel AI SDK

Supervisor

Medium3-8 agents

One orchestrator agent delegates tasks to specialist agents, reviews their work, and synthesizes a final result.

How It Works

The supervisor receives a task, breaks it down, and routes subtasks to the most appropriate specialist agents. It reviews each specialist's output, may request revisions, and compiles the final deliverable.

Diagram

              ┌─ Coder ──────┐
Task ──► Supervisor ─┼─ Researcher ─┼──► Result
              └─ Tester ─────┘

Best For

Complex projects requiring multiple skills — like building a feature that needs research, coding, testing, and documentation.

Tradeoffs

Flexible and adaptive, but the supervisor is a single point of failure. Quality depends heavily on the supervisor's ability to decompose tasks and evaluate results.

Example

Build a feature: Supervisor assigns research to Researcher, implementation to Coder, and validation to Tester, then reviews and merges all outputs.

Frameworks

LangGraphCrewAIAutoGenOpenAI Agents SDKSemantic Kernel

Map-Reduce

Medium3-20+ agents

Parallel map step processes chunks independently, then a reduce step aggregates and synthesizes all partial results.

How It Works

Input is split into chunks. The map phase sends each chunk to a worker agent for parallel processing. The reduce phase takes all mapped outputs and synthesizes them into a single coherent result, resolving conflicts and extracting patterns.

Diagram

         ┌─ Map 1 ─┐
Input ──►│  Map 2  │──► Reduce ──► Result
         └─ Map N ─┘

Best For

Large-scale data processing, document summarization, multi-source analysis, and any task where divide-and-conquer applies.

Tradeoffs

Excellent for large inputs but the reduce step can become a bottleneck. Requires that the task be decomposable into independent chunks.

Example

Summarize a 200-page document: Map agents summarize 10-page chunks in parallel, then a Reduce agent synthesizes chunk summaries into a coherent executive summary.

Frameworks

LangGraphLangChainAutoGenMastraLlamaIndex

Debate

Medium3-5 agents

Multiple agents propose competing solutions, then a critic agent evaluates arguments and selects or synthesizes the best approach.

How It Works

Proposer agents independently generate solutions to the same problem. A critic agent reviews all proposals, identifies strengths and weaknesses, and either selects the best one or synthesizes a superior solution from the best elements of each.

Diagram

┌─ Proposer A ─┐
│  Proposer B  │──► Critic ──► Best Solution
└─ Proposer C ─┘

Best For

Decision-making, strategy selection, code architecture choices, and any task where exploring multiple approaches improves quality.

Tradeoffs

Produces higher-quality outputs through adversarial evaluation, but uses 3-5x more tokens than a single agent. Not suitable for straightforward tasks.

Example

Architecture decision: Three agents propose different database designs (SQL, NoSQL, graph), then a Critic evaluates each against requirements and picks the winner.

Frameworks

AutoGenLangGraphCrewAIPydantic AIsmolagents

Hierarchical

High7-50+ agents

Tree of supervisors and workers where top-level managers delegate to mid-level coordinators who manage specialized worker teams.

How It Works

A top-level supervisor breaks a large objective into domains. Each domain is assigned to a mid-level supervisor who further decomposes it into tasks for worker agents. Results flow back up the tree, with each level aggregating and quality-checking.

Diagram

            Director
           ┌───┴───┐
       Manager A  Manager B
       ┌──┴──┐   ┌──┴──┐
      W1   W2   W3   W4

Best For

Large-scale orchestration with many agents, enterprise workflows, and projects that naturally decompose into teams and sub-teams.

Tradeoffs

Scales to many agents and complex projects, but introduces coordination overhead and latency from multiple management layers. Harder to debug when things go wrong.

Example

Build a full-stack app: Director delegates frontend to Manager A (with UI and CSS workers) and backend to Manager B (with API and DB workers).

Frameworks

LangGraphCrewAIAutoGenSemantic KernelOpenAI Agents SDK

Router

Low3-10 agents

A classifier agent analyzes incoming tasks and routes each one to the most appropriate specialist agent for handling.

How It Works

The router agent receives a task, classifies its type or domain, and forwards it to the matching specialist. Each specialist handles only the task types it excels at. This ensures optimal agent selection without wasting tokens on mismatched capabilities.

Diagram

           ┌─ Code Agent
Task ──► Router ─┼─ Writing Agent
           └─ Research Agent

Best For

Heterogeneous workloads where different tasks require different models, tools, or expertise — like a support system handling billing, technical, and account queries.

Tradeoffs

Efficient routing reduces cost and improves quality, but accuracy depends on the classifier. Misrouted tasks can produce poor results. Adding new task types requires updating the router.

Example

Customer support: Router classifies tickets as billing, technical, or account issues, then routes each to the specialist agent with the right tools and knowledge.

Frameworks

OpenAI Agents SDKLangGraphVercel AI SDKSemantic KernelMastra

Reflection

Low1-2 agents

An agent generates output, then a critic (or the same agent) reviews and improves it iteratively until quality criteria are met.

How It Works

The generator agent produces an initial output. A critic agent (or the generator in self-critique mode) evaluates the output against quality criteria and provides specific feedback. The generator revises based on feedback. This loop repeats until the output meets the quality bar or a max iteration count is reached.

Diagram

Task ──► Generator ──► Critic ──┐
              ▲                  │
              └── Revise ◄───────┘

Best For

Quality-sensitive outputs like production code, published writing, complex analysis, and any task where iterative refinement significantly improves results.

Tradeoffs

Produces noticeably higher quality output, but each iteration adds latency and token cost. Requires well-defined quality criteria to avoid infinite loops.

Example

Code generation: Generator writes a function, Critic reviews for bugs and edge cases, Generator revises until all tests pass and code review criteria are met.

Frameworks

LangGraphAutoGenCrewAIPydantic AIClaude Agent SDK

Consensus

Medium3-7 agents

Multiple agents independently solve the same problem, then a mediator picks the best solution or synthesizes a combined answer. Like best-of-N sampling at the agent level.

How It Works

N agents receive the same task and work on it independently with no communication. Each produces a complete solution. A mediator agent then reviews all solutions, scores them against criteria, and either selects the strongest one or synthesizes a superior answer by combining the best elements from each.

Diagram

         ┌─ Agent 1 ─┐
Task ──► │  Agent 2  │ ──► Mediator ──► Best Solution
         └─ Agent N ─┘

Best For

High-stakes decisions, code review, content quality gates, and any task where independent perspectives reduce bias and catch errors.

Tradeoffs

Dramatically improves output quality through redundancy, but multiplies token cost by the number of agents. The mediator must be capable enough to judge solution quality accurately.

Example

Code review: Three agents independently review a pull request for bugs, security issues, and performance. A mediator synthesizes findings into a single review with deduplicated, prioritized feedback.

Frameworks

CrewAILangGraphAutoGen

Blackboard

High3-10 agents

Agents share a common knowledge base (the blackboard) and contribute partial solutions. Each agent watches the blackboard and acts when it can contribute.

How It Works

A shared data store (the blackboard) holds the current problem state and partial solutions. Specialist agents monitor the blackboard and activate when they detect something they can contribute to. Each agent reads the current state, adds its partial solution, and writes back. A controller checks completion criteria after each update.

Diagram

Agent A ──►┌─────────────┐◄── Agent C
            │ Blackboard  │
Agent B ──►│  (shared     │◄── Agent D
            │   state)    │
            └──────┬──────┘
                   ▼
                Result

Best For

Complex problem solving, collaborative analysis, research synthesis, and tasks where the solution emerges from incremental contributions by different specialists.

Tradeoffs

Highly flexible and supports emergent problem solving, but requires careful state management. Agents may conflict or overwrite each other's contributions. Harder to predict execution order and debug.

Example

Research synthesis: A blackboard holds a research question. An evidence-gathering agent adds sources, an analysis agent identifies patterns, a critique agent flags contradictions, and a writing agent drafts conclusions - all reading and writing to the shared board.

Frameworks

LangGraphSemantic Kernel

Guardrail

Low2 agents

A primary agent does the work while a secondary agent monitors and validates output in real-time. The guardrail agent can block, modify, or flag unsafe or incorrect outputs.

How It Works

The primary agent generates output as normal. Before the output is delivered, a guardrail agent intercepts and evaluates it against safety rules, quality standards, or business constraints. If the output passes, it goes through. If it fails, the guardrail either blocks it, requests a revision, or modifies it directly.

Diagram

Task ──► Primary Agent ──► Guardrail ──► Output
                            │
                      [block/revise]
                            │
                            ▼
                        Rejected

Best For

Safety-critical applications, content moderation, code validation, regulatory compliance, and any task where unchecked output poses risk.

Tradeoffs

Adds a reliable safety layer with minimal complexity, but introduces latency for every output. The guardrail agent must be fast and accurate - false positives block good output, false negatives defeat the purpose.

Example

Code generation: A coding agent writes functions while a guardrail agent checks every output for SQL injection, hardcoded secrets, and unsafe dependencies before the code reaches the user.

Frameworks

OpenAI Agents SDKClaude Agent SDKLangGraph

Tool Augmented

Low1 agents

A single agent with access to a rich tool ecosystem - MCP servers, APIs, file system, databases. Not multi-agent per se, but the tools act as specialized capabilities that extend the agent's reach.

How It Works

One agent receives a task and has access to a registry of tools (via MCP, function calling, or plugin systems). It plans a sequence of tool calls, executes them, observes results, and iterates until the task is complete. The agent decides which tools to use, in what order, and how to combine their outputs.

Diagram

            ┌─ MCP Server
            ├─ API
Agent ──────┼─ File System
            ├─ Database
            └─ Browser
               ▼
            Result

Best For

General-purpose automation, developer workflows, Claude Code-style usage, and tasks where a single smart agent with the right tools outperforms a team of agents.

Tradeoffs

Simple architecture with no coordination overhead, but limited by the agent's ability to plan and use tools correctly. Performance ceiling is bounded by a single model's reasoning capacity.

Example

Developer workflow: Claude Code reads files, runs tests, searches codebases, executes shell commands, and calls APIs - all through tool use from a single agent managing the entire task.

Frameworks

Claude CodeVercel AI SDKLangChain