Skip to content

AI Orchestration Frameworks: The Definitive Guide

Data visualization dashboard representing AI orchestration and workflow management

Building a single LLM call is easy. Building a reliable, multi-step AI system that chains calls, manages state, handles errors, integrates tools, and runs in production? That's orchestration - and it's where frameworks earn their keep.

This guide is a comprehensive deep-dive into every major AI orchestration framework in 2026. We cover LangChain, LangGraph, CrewAI, AutoGen, Semantic Kernel, and Haystack - with real, working code examples, honest comparisons, and production-ready patterns. Whether you're building a simple RAG pipeline or a multi-agent system, this is your reference.

1. What is AI Orchestration?

AI orchestration is the practice of coordinating multiple AI components - LLM calls, tool invocations, data retrievals, and decision logic - into a coherent workflow that accomplishes a goal. Think of it as the conductor of an AI orchestra: each instrument (model, tool, database) plays its part, but someone needs to manage the timing, sequencing, and error recovery.

Why You Need It

A single openai.chat.completions.create() call gets you surprisingly far. But real applications quickly need more:

  • Chaining LLM calls: One model's output feeds into another's input. Summarize β†’ analyze β†’ recommend.
  • State management: Tracking conversation history, intermediate results, and workflow progress across multiple steps.
  • Error handling: Retries, fallback models, graceful degradation when an API is down or a model hallucinates.
  • Tool integration: Letting the LLM call APIs, query databases, execute code, and search the web.
  • Structured output: Parsing LLM responses into typed objects your application can actually use.
  • Observability: Tracing every step so you can debug, evaluate, and optimize.

Simple vs. Complex Orchestration

Not every project needs a framework. Here's the spectrum:

Orchestration Complexity Spectrum:

Simple                                              Complex
  β”‚                                                      β”‚
  β–Ό                                                      β–Ό
Single LLM call β†’ Chain of calls β†’ RAG pipeline β†’ Agent loop β†’ Multi-agent system
  β”‚                    β”‚                β”‚              β”‚               β”‚
  No framework     Maybe LCEL      LangChain      LangGraph      CrewAI/AutoGen
  needed           is enough       or Haystack    or custom       or custom

When to Use a Framework vs. Roll Your Own

Use a framework when: You need RAG, agents, multi-step workflows, or multi-agent coordination. The framework handles state management, tool calling protocols, streaming, and the dozens of edge cases you'd otherwise discover in production.

Roll your own when: You have a simple chain of 2-3 LLM calls, you need maximum control over every detail, or you're building something the frameworks don't support well. A few functions with httpx and pydantic can be cleaner than importing a framework for a simple use case.

Here's what "rolling your own" looks like for a simple chain:

# Simple orchestration without a framework
import openai
from pydantic import BaseModel

client = openai.OpenAI()

class Analysis(BaseModel):
    summary: str
    sentiment: str
    key_topics: list[str]

def analyze_text(text: str) -> Analysis:
    # Step 1: Summarize
    summary_resp = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Summarize this text in 2 sentences."},
            {"role": "user", "content": text}
        ]
    )
    summary = summary_resp.choices[0].message.content

    # Step 2: Analyze the summary (structured output)
    analysis_resp = client.beta.chat.completions.parse(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Analyze this summary."},
            {"role": "user", "content": summary}
        ],
        response_format=Analysis
    )
    return analysis_resp.choices[0].message.parsed

This works fine for simple cases. But once you need streaming, retries, tool calling, memory, or tracing - you'll be rebuilding what frameworks already provide.

2. LangChain

LangChain is the most widely adopted AI orchestration framework. It provides a modular toolkit for building LLM-powered applications: prompts, models, output parsers, document loaders, retrievers, and chains - all composable via the LangChain Expression Language (LCEL).

Core Concepts

  • Chains: Sequences of operations. Prompt β†’ Model β†’ Parser.
  • Prompts: Templated messages with variables. Support system/user/assistant roles.
  • Output Parsers: Convert raw LLM text into structured data (strings, JSON, Pydantic models).
  • Document Loaders: Ingest data from PDFs, web pages, databases, APIs, and 100+ sources.
  • Text Splitters: Break documents into chunks for embedding and retrieval.
  • Retrievers: Search vector stores, keyword indexes, or hybrid systems to find relevant context.

LCEL: The Pipe Operator

LCEL lets you compose components with the | (pipe) operator. Each component's output becomes the next component's input - like Unix pipes for AI:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}")
])

chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()
result = chain.invoke({"input": "Explain microservices"})
print(result)

The beauty of LCEL is that every chain automatically supports .invoke(), .stream(), .batch(), and .ainvoke() - sync, streaming, batch, and async - with zero extra code.

# Streaming - works automatically with LCEL
for chunk in chain.stream({"input": "Explain microservices"}):
    print(chunk, end="", flush=True)

# Batch - process multiple inputs in parallel
results = chain.batch([
    {"input": "Explain microservices"},
    {"input": "Explain serverless"},
    {"input": "Explain containers"},
])

Document Loading + Splitting

LangChain has 100+ document loaders. Here's a common pattern - load a PDF, split it into chunks, and embed it:

from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma

# Load
loader = PyPDFLoader("architecture-guide.pdf")
docs = loader.load()

# Split
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(docs)

# Embed and store
vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
    persist_directory="./chroma_db"
)
print(f"Indexed {len(chunks)} chunks")

Retriever Chain (RAG)

Once you have a vector store, build a RAG chain that retrieves relevant context before answering:

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

retriever = vectorstore.as_retriever(search_kwargs={"k": 4})

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", """Answer the question based only on the following context.
If the context doesn't contain the answer, say "I don't have that information."

Context:
{context}"""),
    ("user", "{question}")
])

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = rag_chain.invoke("What are the key architectural principles?")
print(answer)

Structured Output with Pydantic

Force the LLM to return typed, validated data using Pydantic models:

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class CodeReview(BaseModel):
    """Structured code review output."""
    issues: list[str] = Field(description="List of issues found")
    severity: str = Field(description="Overall severity: low, medium, high")
    suggestions: list[str] = Field(description="Improvement suggestions")
    score: int = Field(description="Code quality score 1-10", ge=1, le=10)

llm = ChatOpenAI(model="gpt-4o")
structured_llm = llm.with_structured_output(CodeReview)

review = structured_llm.invoke(
    "Review this code: def add(a,b): return a+b"
)
print(f"Score: {review.score}/10")
print(f"Severity: {review.severity}")
for issue in review.issues:
    print(f"  - {issue}")

This is one of LangChain's killer features - .with_structured_output() works across providers (OpenAI, Anthropic, Google) and handles the JSON schema generation, function calling, and validation automatically.

3. LangGraph

LangGraph is LangChain's framework for building stateful, multi-step agent workflows as graphs. While LCEL handles linear chains, LangGraph handles cycles, conditional branching, and persistent state - everything you need for real agents.

Core Concepts

  • State: A TypedDict that flows through the graph. Every node reads and writes to it.
  • Nodes: Functions that take state, do work, and return state updates.
  • Edges: Connections between nodes. Can be unconditional or conditional.
  • Conditional Edges: Route to different nodes based on state (the "if/else" of graphs).
  • Checkpointing: Persist state between runs for long-running workflows and human-in-the-loop.

Graph Flow Visualization

Multi-Step Agent Graph:

                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  START  β”‚
                    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
            β”Œβ”€β”€β”€β”€β–Άβ”‚   Agent LLM  │◀────────────────┐
            β”‚     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
            β”‚            β”‚                          β”‚
            β”‚     β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”                  β”‚
            β”‚     β”‚   Router     β”‚                  β”‚
            β”‚     β”‚ (conditional)β”‚                  β”‚
            β”‚     β””β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”€β”¬β”€β”€β”˜                  β”‚
            β”‚        β”‚   β”‚   β”‚                      β”‚
            β”‚   tool β”‚   β”‚   β”‚ end                  β”‚
            β”‚   call β”‚   β”‚   β”‚                      β”‚
            β”‚        β–Ό   β”‚   β–Ό                      β”‚
            β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”‚ β”Œβ”€β”€β”€β”€β”€β”                 β”‚
            β”‚  β”‚  Tools β”‚β”‚ β”‚ END β”‚                 β”‚
            β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β”‚ β””β”€β”€β”€β”€β”€β”˜                 β”‚
            β”‚      β”‚     β”‚                          β”‚
            β””β”€β”€β”€β”€β”€β”€β”˜     β”‚ human                    β”‚
                         β”‚ review                   β”‚
                         β–Ό                          β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
                  β”‚    Human     │──── approved β”€β”€β”€β”€β”˜
                  β”‚   Review     β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Building a Complete Agent

Here's a full LangGraph agent with tool calling, conditional routing, and state management:

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage

class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    next_step: str

# Define tools
from langchain_core.tools import tool

@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    # In production, use a real search API
    return f"Search results for: {query}"

@tool
def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        result = eval(expression)  # Use a safe evaluator in production
        return str(result)
    except Exception as e:
        return f"Error: {e}"

tools = [search_web, calculate]
llm = ChatOpenAI(model="gpt-4o").bind_tools(tools)

# Node: call the LLM
def agent_node(state: AgentState) -> dict:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

# Node: execute tool calls
def tool_node(state: AgentState) -> dict:
    last_message = state["messages"][-1]
    results = []
    for call in last_message.tool_calls:
        tool_fn = {t.name: t for t in tools}[call["name"]]
        result = tool_fn.invoke(call["args"])
        results.append(
            ToolMessage(content=result, tool_call_id=call["id"])
        )
    return {"messages": results}

# Conditional edge: should we call tools or finish?
def should_continue(state: AgentState) -> str:
    last_message = state["messages"][-1]
    if hasattr(last_message, "tool_calls") and last_message.tool_calls:
        return "tools"
    return "end"

# Build the graph
graph = StateGraph(AgentState)
graph.add_node("agent", agent_node)
graph.add_node("tools", tool_node)
graph.set_entry_point("agent")
graph.add_conditional_edges("agent", should_continue, {
    "tools": "tools",
    "end": END
})
graph.add_edge("tools", "agent")  # After tools, go back to agent

app = graph.compile()

# Run it
result = app.invoke({
    "messages": [HumanMessage(content="What is 42 * 17, and search for the latest Python release")]
})
for msg in result["messages"]:
    print(f"{msg.type}: {msg.content[:100]}")

Human-in-the-Loop with interrupt_before

LangGraph's killer feature: pause execution before a node, let a human review, then resume:

from langgraph.checkpoint.memory import MemorySaver

checkpointer = MemorySaver()

# Compile with interrupt - pauses BEFORE the tools node
app_with_hitl = graph.compile(
    checkpointer=checkpointer,
    interrupt_before=["tools"]
)

# Start the conversation
config = {"configurable": {"thread_id": "review-123"}}
result = app_with_hitl.invoke(
    {"messages": [HumanMessage(content="Search for competitor pricing")]},
    config=config
)

# Execution pauses here - the agent wants to call tools
# A human reviews the pending tool calls:
pending_calls = result["messages"][-1].tool_calls
print("Agent wants to call:")
for call in pending_calls:
    print(f"  {call['name']}({call['args']})")

# Human approves - resume execution
# (pass None to continue from where we left off)
final_result = app_with_hitl.invoke(None, config=config)
print(final_result["messages"][-1].content)

Checkpointing with MemorySaver

Checkpointing persists the full graph state, enabling long-running workflows, crash recovery, and time-travel debugging:

from langgraph.checkpoint.memory import MemorySaver
from langgraph.checkpoint.postgres import PostgresSaver

# In-memory (development)
memory_saver = MemorySaver()

# PostgreSQL (production)
# postgres_saver = PostgresSaver.from_conn_string(
#     "postgresql://user:pass@localhost/langgraph"
# )

app = graph.compile(checkpointer=memory_saver)

# Each thread_id maintains separate state
config_a = {"configurable": {"thread_id": "user-alice"}}
config_b = {"configurable": {"thread_id": "user-bob"}}

# Alice's conversation
app.invoke({"messages": [HumanMessage(content="Hello, I'm Alice")]}, config_a)

# Bob's conversation (completely separate state)
app.invoke({"messages": [HumanMessage(content="Hello, I'm Bob")]}, config_b)

# Continue Alice's conversation - state is preserved
app.invoke({"messages": [HumanMessage(content="What's my name?")]}, config_a)
# Agent correctly responds "Alice" because state is checkpointed

LangGraph is the right choice when you need cycles (agent loops), conditional branching, persistent state, or human-in-the-loop. For linear chains, stick with LCEL.

4. CrewAI

CrewAI takes a fundamentally different approach: instead of graphs and chains, you define agents with roles, assign them tasks, and organize them into crews. It's the most intuitive framework for multi-agent systems - think of it as assembling a team of specialists.

Core Concepts

  • Agent: A persona with a role, goal, backstory, and tools. Each agent is a specialist.
  • Task: A specific job assigned to an agent, with a description and expected output.
  • Crew: A team of agents working together on tasks, with a defined process.
  • Process: How tasks are executed - sequential (one after another) or hierarchical (manager delegates).
  • Tools: Functions agents can call - search, scrape, file I/O, APIs, custom tools.

Full Crew with 3 Agents

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, WebsiteSearchTool

# Agent 1: Researcher
researcher = Agent(
    role="Senior Research Analyst",
    goal="Find comprehensive information about {topic}",
    backstory="Expert researcher with 10 years experience in technology "
              "analysis. Known for thorough, accurate research.",
    tools=[SerperDevTool(), WebsiteSearchTool()],
    llm="gpt-4o",
    verbose=True
)

# Agent 2: Writer
writer = Agent(
    role="Technical Content Writer",
    goal="Write an engaging, accurate article about {topic}",
    backstory="Award-winning technical writer who makes complex topics "
              "accessible. Focuses on practical, actionable content.",
    llm="gpt-4o",
    verbose=True
)

# Agent 3: Editor
editor = Agent(
    role="Senior Editor",
    goal="Review and polish the article for publication",
    backstory="Meticulous editor with a keen eye for accuracy, clarity, "
              "and engagement. Ensures content meets publication standards.",
    llm="gpt-4o",
    verbose=True
)

# Define tasks
research_task = Task(
    description="Research {topic} thoroughly. Find key trends, statistics, "
                "expert opinions, and practical examples. Cover at least "
                "5 different aspects of the topic.",
    expected_output="A detailed research brief with sources, key findings, "
                    "statistics, and expert quotes.",
    agent=researcher
)

writing_task = Task(
    description="Using the research brief, write a 1500-word article about "
                "{topic}. Include an introduction, 5 main sections, code "
                "examples where relevant, and a conclusion.",
    expected_output="A complete, well-structured article in markdown format.",
    agent=writer
)

editing_task = Task(
    description="Review the article for accuracy, clarity, grammar, and "
                "engagement. Fix any issues and add suggestions for "
                "improvement. Ensure all claims are supported by the research.",
    expected_output="The final, polished article ready for publication.",
    agent=editor
)

# Assemble the crew - sequential process
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential,
    verbose=True
)

# Run it
result = crew.kickoff(inputs={"topic": "AI orchestration frameworks in 2026"})
print(result)

Hierarchical Process

In hierarchical mode, a manager agent automatically delegates tasks to the best-suited agent:

# Hierarchical - a manager coordinates the agents
hierarchical_crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.hierarchical,
    manager_llm="gpt-4o",
    verbose=True
)

result = hierarchical_crew.kickoff(
    inputs={"topic": "Kubernetes security best practices"}
)

Custom Tools

Build your own tools for agents to use:

from crewai.tools import BaseTool
from pydantic import BaseModel, Field
import requests

class GitHubSearchInput(BaseModel):
    query: str = Field(description="Search query for GitHub repositories")

class GitHubSearchTool(BaseTool):
    name: str = "GitHub Repository Search"
    description: str = "Search GitHub for repositories matching a query"
    args_schema: type[BaseModel] = GitHubSearchInput

    def _run(self, query: str) -> str:
        resp = requests.get(
            "https://api.github.com/search/repositories",
            params={"q": query, "sort": "stars", "per_page": 5},
            headers={"Accept": "application/vnd.github.v3+json"}
        )
        repos = resp.json().get("items", [])
        results = []
        for repo in repos:
            results.append(
                f"- {repo['full_name']} ⭐ {repo['stargazers_count']}: "
                f"{repo['description']}"
            )
        return "\n".join(results) or "No repositories found."

# Use it
researcher_with_github = Agent(
    role="Open Source Analyst",
    goal="Find and analyze relevant open source projects",
    backstory="Expert in evaluating open source software quality and adoption.",
    tools=[GitHubSearchTool(), SerperDevTool()],
    llm="gpt-4o"
)

CrewAI shines when you can naturally decompose a problem into roles. If you find yourself thinking "I need a researcher, a writer, and an editor," CrewAI is your framework.

5. AutoGen

AutoGen is Microsoft's framework for building multi-agent conversations. The core idea: agents talk to each other to solve problems. One agent proposes a solution, another critiques it, a third executes code to verify - all through natural conversation.

Core Concepts

  • AssistantAgent: An LLM-powered agent that generates responses and code.
  • UserProxyAgent: Represents the human user. Can auto-execute code or ask for human input.
  • GroupChat: A conversation with 3+ agents, managed by a GroupChatManager that decides who speaks next.
  • Code Execution: Agents can write and execute code in sandboxed environments (local or Docker).

Two-Agent Conversation

from autogen import AssistantAgent, UserProxyAgent

# Configuration for the LLM
llm_config = {
    "model": "gpt-4o",
    "api_key": "your-api-key",  # or use OAI_CONFIG_LIST
}

# The assistant - generates solutions
assistant = AssistantAgent(
    name="coding_assistant",
    system_message="You are a helpful AI assistant. Solve tasks using Python code. "
                   "When you write code, put it in ```python blocks. "
                   "Reply TERMINATE when the task is done.",
    llm_config=llm_config
)

# The user proxy - executes code and provides feedback
user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",  # Auto-execute without asking
    max_consecutive_auto_reply=5,
    code_execution_config={
        "work_dir": "coding_output",
        "use_docker": False  # Set True for sandboxed execution
    }
)

# Start the conversation
user_proxy.initiate_chat(
    assistant,
    message="Create a Python script that fetches the top 10 trending "
            "GitHub repositories and saves them to a CSV file."
)

Group Chat with 3+ Agents

Multiple agents collaborate through conversation - each with a different expertise:

from autogen import AssistantAgent, UserProxyAgent, GroupChat, GroupChatManager

llm_config = {"model": "gpt-4o"}

# Planner - breaks down tasks
planner = AssistantAgent(
    name="planner",
    system_message="You are a project planner. Break down complex tasks into "
                   "clear, actionable steps. Do not write code.",
    llm_config=llm_config
)

# Coder - writes the code
coder = AssistantAgent(
    name="coder",
    system_message="You are an expert Python developer. Write clean, "
                   "well-documented code based on the plan. Put code in "
                   "```python blocks.",
    llm_config=llm_config
)

# Reviewer - reviews code quality
reviewer = AssistantAgent(
    name="reviewer",
    system_message="You are a senior code reviewer. Review code for bugs, "
                   "security issues, and best practices. Be specific and "
                   "constructive. Say APPROVE if the code is ready.",
    llm_config=llm_config
)

# Executor - runs the code
executor = UserProxyAgent(
    name="executor",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "group_output", "use_docker": False}
)

# Create the group chat
group_chat = GroupChat(
    agents=[planner, coder, reviewer, executor],
    messages=[],
    max_round=15,
    speaker_selection_method="auto"  # LLM decides who speaks next
)

manager = GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config
)

# Kick off the conversation
executor.initiate_chat(
    manager,
    message="Build a REST API with FastAPI that has CRUD endpoints for a "
            "todo list with SQLite storage. Include input validation."
)

Code Execution in Docker

For production safety, run agent-generated code in Docker containers:

from autogen.coding import DockerCommandLineCodeExecutor

# Create a Docker-based executor
docker_executor = DockerCommandLineCodeExecutor(
    image="python:3.12-slim",
    timeout=60,
    work_dir="./docker_output"
)

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
    code_execution_config={"executor": docker_executor}
)

# Now any code the assistant writes runs in an isolated container
# - no access to your host filesystem or network (unless configured)

AutoGen excels at open-ended problem solving where agents need to iterate through conversation. The conversation-first approach makes it natural for tasks like "build this feature" where planning, coding, reviewing, and testing happen in a back-and-forth flow.

6. Semantic Kernel

Semantic Kernel is Microsoft's SDK for integrating AI into applications. It's the enterprise choice - first-class C#/.NET support, strong typing, plugin architecture, and deep Azure integration. It also supports Python and Java.

Core Concepts

  • Kernel: The central orchestrator that manages services, plugins, and memory.
  • Plugins: Collections of functions (native code or LLM prompts) that the kernel can invoke.
  • Planners: Automatically compose plugins into multi-step plans to achieve a goal.
  • Memory: Built-in semantic memory for storing and retrieving information by meaning.
  • Connectors: Integrations with OpenAI, Azure OpenAI, Hugging Face, and other providers.

Python Example

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
from semantic_kernel.functions import kernel_function

# Initialize the kernel
kernel = sk.Kernel()
kernel.add_service(
    OpenAIChatCompletion(
        ai_model_id="gpt-4o",
        service_id="chat"
    )
)

# Define a plugin with native functions
class ContentPlugin:
    @kernel_function(
        name="summarize",
        description="Summarize text to a specified length"
    )
    def summarize(self, text: str, max_words: int = 100) -> str:
        # This would normally call the LLM via the kernel
        return f"Summary of ({len(text)} chars) in {max_words} words"

    @kernel_function(
        name="translate",
        description="Translate text to a target language"
    )
    def translate(self, text: str, language: str = "Spanish") -> str:
        return f"Translated to {language}: {text}"

# Add the plugin
kernel.add_plugin(ContentPlugin(), plugin_name="content")

# Invoke a function directly
result = await kernel.invoke(
    kernel.get_function("content", "summarize"),
    text="Long article text here...",
    max_words=50
)
print(result)

C# Example

using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;
using System.ComponentModel;

// Build the kernel
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion(
    deploymentName: "gpt-4o",
    endpoint: "https://your-resource.openai.azure.com/",
    apiKey: "your-api-key"
);
var kernel = builder.Build();

// Define a plugin
public class WeatherPlugin
{
    [KernelFunction, Description("Get the current weather for a city")]
    public string GetWeather(
        [Description("The city name")] string city)
    {
        // In production, call a real weather API
        return $"The weather in {city} is 72Β°F and sunny.";
    }

    [KernelFunction, Description("Get a 5-day forecast")]
    public string GetForecast(
        [Description("The city name")] string city)
    {
        return $"5-day forecast for {city}: Sunny, Cloudy, Rain, Sunny, Sunny";
    }
}

// Register and use
kernel.Plugins.AddFromType<WeatherPlugin>();

// Auto function calling - the LLM decides which plugins to use
var settings = new OpenAIPromptExecutionSettings
{
    ToolCallBehavior = ToolCallBehavior.AutoInvokeKernelFunctions
};

var chatService = kernel.GetRequiredService<IChatCompletionService>();
var history = new ChatHistory();
history.AddUserMessage("What's the weather in Seattle and should I bring an umbrella this week?");

var response = await chatService.GetChatMessageContentAsync(
    history, settings, kernel
);
Console.WriteLine(response.Content);

Semantic Kernel is the right choice for enterprise .NET shops, teams already on Azure, or projects that need strong typing and plugin architecture. The C# experience is significantly more polished than the Python SDK.

7. Haystack

Haystack by deepset is a framework purpose-built for document processing and RAG pipelines. While other frameworks bolt on RAG as a feature, Haystack makes it the core abstraction. If your primary use case is search, question answering, or document intelligence, Haystack deserves serious consideration.

Core Concepts

  • Components: Modular building blocks - converters, splitters, embedders, retrievers, generators, rankers.
  • Pipelines: DAGs (directed acyclic graphs) of components. Data flows through connected components.
  • Document Stores: Storage backends for documents and embeddings - Elasticsearch, Weaviate, Pinecone, Chroma, pgvector.
  • Converters: Turn files (PDF, DOCX, HTML, Markdown) into Haystack Document objects.

RAG Pipeline Example

from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.embedders import (
    OpenAIDocumentEmbedder,
    OpenAITextEmbedder
)
from haystack.components.writers import DocumentWriter
from haystack.components.retrievers.in_memory import (
    InMemoryEmbeddingRetriever
)
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator
from haystack_integrations.document_stores.chroma import ChromaDocumentStore

# --- Indexing Pipeline ---
document_store = ChromaDocumentStore()

indexing = Pipeline()
indexing.add_component("converter", PyPDFToDocument())
indexing.add_component("splitter", DocumentSplitter(
    split_by="sentence",
    split_length=3,
    split_overlap=1
))
indexing.add_component("embedder", OpenAIDocumentEmbedder(
    model="text-embedding-3-small"
))
indexing.add_component("writer", DocumentWriter(
    document_store=document_store
))

# Connect the components
indexing.connect("converter", "splitter")
indexing.connect("splitter", "embedder")
indexing.connect("embedder", "writer")

# Run indexing
indexing.run({"converter": {"sources": ["report.pdf"]}})

# --- Query Pipeline ---
template = """Answer the question based on the context below.
Context:
{% for doc in documents %}
  {{ doc.content }}
{% endfor %}

Question: {{ question }}
Answer:"""

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", OpenAITextEmbedder(
    model="text-embedding-3-small"
))
query_pipeline.add_component("retriever", InMemoryEmbeddingRetriever(
    document_store=document_store,
    top_k=5
))
query_pipeline.add_component("prompt", PromptBuilder(template=template))
query_pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))

# Connect query components
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
query_pipeline.connect("retriever.documents", "prompt.documents")
query_pipeline.connect("prompt", "llm")

# Run a query
result = query_pipeline.run({
    "text_embedder": {"text": "What were the key findings?"},
    "prompt": {"question": "What were the key findings?"}
})
print(result["llm"]["replies"][0])

Haystack's explicit pipeline wiring is more verbose than LangChain's LCEL, but it's also more transparent - you can see exactly how data flows between components. The framework excels at document-heavy workloads where you need fine-grained control over ingestion, splitting, embedding, and retrieval.

8. Framework Comparison

Here's an honest, side-by-side comparison of every framework we've covered:

Framework Best For Learning Curve Agent Support Multi-Agent Streaming Production Ready Languages Community
LangChain RAG, chains, general orchestration Medium βœ… Good (via LangGraph) ⚠️ Basic βœ… Native βœ… Yes Python, JS/TS πŸ”₯ Largest (95k+ ⭐)
LangGraph Complex agents, stateful workflows High βœ… Excellent βœ… Yes βœ… Native βœ… Yes Python, JS/TS Growing (10k+ ⭐)
CrewAI Multi-agent teams, role-based tasks Low βœ… Excellent βœ… Core feature ⚠️ Limited ⚠️ Maturing Python Fast-growing (25k+ ⭐)
AutoGen Conversational agents, code generation Medium βœ… Excellent βœ… Core feature βœ… Yes ⚠️ Maturing Python, .NET Large (35k+ ⭐)
Semantic Kernel Enterprise, .NET, Azure integration Medium-High βœ… Good ⚠️ Basic βœ… Yes βœ… Yes C#, Python, Java Solid (22k+ ⭐)
Haystack Document processing, RAG, search Medium ⚠️ Basic ❌ No βœ… Yes βœ… Yes Python Established (18k+ ⭐)

9. Choosing the Right Framework

The "best" framework depends entirely on your use case. Here's a decision guide:

Decision Flowchart

Which Framework Should You Use?

What are you building?
β”‚
β”œβ”€β”€ Simple chain (prompt β†’ LLM β†’ parse)?
β”‚   └── βœ… LangChain LCEL - minimal overhead, great DX
β”‚
β”œβ”€β”€ RAG / document search pipeline?
β”‚   β”œβ”€β”€ Need fine-grained control over ingestion?
β”‚   β”‚   └── βœ… Haystack - purpose-built for document pipelines
β”‚   └── Need RAG + agents + other features?
β”‚       └── βœ… LangChain - broadest ecosystem
β”‚
β”œβ”€β”€ Complex agent with loops and branching?
β”‚   β”œβ”€β”€ Need human-in-the-loop?
β”‚   β”‚   └── βœ… LangGraph - checkpointing + interrupt_before
β”‚   └── Need persistent state across sessions?
β”‚       └── βœ… LangGraph - built-in checkpointing
β”‚
β”œβ”€β”€ Multi-agent team collaboration?
β”‚   β”œβ”€β”€ Role-based (researcher, writer, editor)?
β”‚   β”‚   └── βœ… CrewAI - most intuitive role-based API
β”‚   └── Conversation-based (agents discuss and iterate)?
β”‚       └── βœ… AutoGen - conversation-first design
β”‚
β”œβ”€β”€ Enterprise / .NET shop?
β”‚   └── βœ… Semantic Kernel - best C# support, Azure integration
β”‚
└── Not sure / prototyping?
    └── βœ… LangChain - largest ecosystem, most examples, easiest to start

Practical Recommendations

  • Starting out? Begin with LangChain LCEL. It has the most tutorials, examples, and community support. You can always add LangGraph when you need agents.
  • Building agents? LangGraph gives you the most control. CrewAI is faster to prototype with but harder to customize.
  • Multi-agent systems? CrewAI for role-based teams, AutoGen for conversation-based collaboration. Try both - the right choice depends on your mental model.
  • Enterprise .NET? Semantic Kernel is the only serious option. The C# SDK is excellent.
  • Document-heavy workloads? Haystack's pipeline model gives you the most control over ingestion and retrieval.
  • Mixing frameworks? Totally valid. Use LangChain for RAG, LangGraph for agent orchestration, and LangSmith for observability. They're designed to work together.

10. Production Patterns

Getting a demo working is easy. Getting it to run reliably in production is where the real engineering happens. Here are the patterns that matter.

Error Handling & Retries

LLM APIs fail. Rate limits hit. Models hallucinate. Build resilience from day one:

from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnableConfig
from tenacity import retry, stop_after_attempt, wait_exponential
import logging

logger = logging.getLogger(__name__)

# Pattern 1: LangChain's built-in retry
llm_with_retry = ChatOpenAI(
    model="gpt-4o",
    max_retries=3,
    request_timeout=30
)

# Pattern 2: Custom retry with tenacity for complex logic
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    reraise=True
)
def robust_chain_invoke(chain, input_data):
    try:
        return chain.invoke(input_data)
    except Exception as e:
        logger.warning(f"Chain failed, retrying: {e}")
        raise

Fallback Models

If your primary model is down or rate-limited, fall back to an alternative:

from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic

# Primary: GPT-4o, Fallback: Claude 3.5 Sonnet
primary = ChatOpenAI(model="gpt-4o", max_retries=2)
fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022", max_retries=2)

# .with_fallbacks() tries the next model if the first fails
llm = primary.with_fallbacks([fallback])

# Works transparently - your chain doesn't know which model responded
chain = prompt | llm | StrOutputParser()
result = chain.invoke({"input": "Explain orchestration"})
# If GPT-4o is down, automatically uses Claude

Streaming Responses

Don't make users wait for the full response. Stream tokens as they're generated:

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}")
])
chain = prompt | ChatOpenAI(model="gpt-4o") | StrOutputParser()

# Sync streaming
for chunk in chain.stream({"input": "Write a haiku about Python"}):
    print(chunk, end="", flush=True)

# Async streaming (for web frameworks like FastAPI)
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

@app.get("/chat")
async def chat(question: str):
    async def generate():
        async for chunk in chain.astream({"input": question}):
            yield chunk
    return StreamingResponse(generate(), media_type="text/plain")

Caching

Identical prompts should return cached results - saves money and latency:

from langchain_core.globals import set_llm_cache
from langchain_community.cache import SQLiteCache, RedisCache

# SQLite cache (development)
set_llm_cache(SQLiteCache(database_path=".langchain_cache.db"))

# Redis cache (production)
# from redis import Redis
# set_llm_cache(RedisCache(redis_=Redis(host="localhost", port=6379)))

# Now identical calls are cached automatically
llm = ChatOpenAI(model="gpt-4o")
# First call: hits the API (~1-2s)
result1 = llm.invoke("What is 2+2?")
# Second call: returns from cache (~1ms)
result2 = llm.invoke("What is 2+2?")

Observability with LangSmith

You can't debug what you can't see. LangSmith traces every step of your chains and agents:

# Set environment variables to enable LangSmith tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-orchestration-app"

# That's it - all LangChain/LangGraph calls are now traced
# Every chain.invoke() logs:
#   - Input/output at each step
#   - Latency per component
#   - Token usage and cost
#   - Error traces with full context
#   - Tool call arguments and results

# Custom tracing for non-LangChain code
from langsmith import traceable

@traceable(name="custom_processing")
def process_results(raw_data: str) -> dict:
    # Your custom logic here
    processed = {"summary": raw_data[:100], "length": len(raw_data)}
    return processed

# This function now appears in LangSmith traces alongside LangChain calls

Rate Limiting

Protect yourself from runaway costs and API rate limits:

from langchain_core.rate_limiters import InMemoryRateLimiter

# Limit to 10 requests per second
rate_limiter = InMemoryRateLimiter(
    requests_per_second=10,
    check_every_n_seconds=0.1,
    max_bucket_size=20
)

llm = ChatOpenAI(
    model="gpt-4o",
    rate_limiter=rate_limiter
)

# Now even batch operations respect the rate limit
results = chain.batch([
    {"input": f"Question {i}"} for i in range(100)
])  # Automatically throttled to 10 req/s

Putting It All Together

A production-ready chain combines all these patterns:

import os
from langchain_openai import ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.globals import set_llm_cache
from langchain_core.rate_limiters import InMemoryRateLimiter
from langchain_community.cache import SQLiteCache

# Observability
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = "production-app"

# Caching
set_llm_cache(SQLiteCache(database_path=".cache.db"))

# Rate limiting
limiter = InMemoryRateLimiter(requests_per_second=10)

# Model with fallback + retry + rate limiting
primary = ChatOpenAI(model="gpt-4o", max_retries=3, rate_limiter=limiter)
fallback = ChatAnthropic(model="claude-3-5-sonnet-20241022", max_retries=3)
llm = primary.with_fallbacks([fallback])

# The chain
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Be concise."),
    ("user", "{input}")
])
chain = prompt | llm | StrOutputParser()

# This chain now has: retries, fallback, caching, rate limiting, and tracing
result = chain.invoke({"input": "Explain AI orchestration"})

What's Next

AI orchestration is evolving fast. The frameworks are converging on common patterns - state graphs, tool calling, structured output - while differentiating on developer experience and ecosystem. Here's where to go from here:

  • Start with one framework. LangChain + LangGraph covers 90% of use cases. Master it before exploring alternatives.
  • Build incrementally. Start with a simple LCEL chain. Add RAG when you need context. Add agents when you need autonomy. Add multi-agent when a single agent can't handle the complexity.
  • Invest in observability early. Set up LangSmith (or an alternative like Langfuse) from day one. You'll thank yourself when debugging production issues.
  • Test with evals, not vibes. Build evaluation datasets for your specific use case. Automated evals catch regressions that manual testing misses.
  • Production patterns matter more than framework choice. Retries, fallbacks, caching, rate limiting, and observability are what separate demos from production systems.

Want More?

Check out our Agentic AI Workflows guide for a deep-dive into agent architectures, or explore our AI tools comparison to pick the right foundation models for your orchestration layer.