Skip to content

Building Agentic AI Workflows: A Complete Deep-Dive

Humanoid robot representing AI agents and agentic workflows

Large language models are impressive, but they're fundamentally reactive - you prompt, they respond. AI agents change the game entirely. They observe their environment, reason about goals, take actions using tools, and iterate until the job is done. They don't just answer questions; they do work.

This guide is a complete deep-dive into building agentic AI workflows. We'll cover agent architectures, tool use, multi-agent orchestration, memory systems, planning strategies, and production deployment - all with real, working code examples you can adapt to your own projects.

1. What Are AI Agents?

An AI agent is a system that uses a language model as its core reasoning engine to autonomously decide what actions to take, execute those actions via tools, observe the results, and repeat until a goal is achieved. Unlike a chatbot that produces a single response per prompt, an agent operates in a loop.

Agents vs. Chatbots

The distinction matters:

  • Chatbot: User sends message β†’ LLM generates response β†’ done. Stateless, single-turn (or multi-turn with simple context). No ability to take real-world actions.
  • Agent: User defines goal β†’ LLM reasons about steps β†’ executes tools (APIs, code, databases) β†’ observes results β†’ reasons again β†’ repeats until goal is met. Stateful, multi-step, action-oriented.

A chatbot can tell you how to query a database. An agent queries the database, analyzes the results, generates a report, and emails it to your team.

The ReAct Pattern (Reason + Act)

The foundational pattern for modern agents is ReAct (Reasoning and Acting), introduced by Yao et al. in 2022. The idea is simple but powerful: interleave reasoning traces with action execution.

# The ReAct loop in pseudocode
while not task_complete:
    # REASON: The LLM thinks about what to do next
    thought = llm.generate(f"""
        Task: {goal}
        Previous observations: {observations}
        What should I do next and why?
    """)

    # ACT: Execute the chosen action
    action = parse_action(thought)
    result = execute_tool(action)

    # OBSERVE: Record the result
    observations.append(result)

    # CHECK: Has the goal been achieved?
    task_complete = check_completion(goal, observations)

Each iteration produces a Thought (reasoning about the current state), an Action (a tool call or decision), and an Observation (the result of that action). This creates an auditable trace of the agent's decision-making process.

The Agent Loop: Observe β†’ Think β†’ Act β†’ Observe

Every agent, regardless of framework, follows this fundamental loop:

The Agent Loop (Conceptual Diagram):

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   OBSERVE   β”‚ ← Receive input / tool results
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    THINK    β”‚ ← LLM reasons about state + goal
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚     ACT     β”‚ ← Call tool / generate output
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   OBSERVE   β”‚ ← Collect tool result
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
  Goal met? ──No──→ Loop back to THINK
       β”‚
      Yes
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   RESPOND   β”‚ ← Return final answer
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The key insight is that the LLM isn't just generating text - it's driving a control loop. The model decides which tool to call, with what parameters, and how to interpret the results. This is what makes agents fundamentally different from prompt chains or simple RAG pipelines.

2. Core Agent Capabilities

Agents derive their power from four core capabilities: tool use, planning, memory, and self-reflection. Each is essential, and the best agents combine all four.

Tool Use / Function Calling

Tools give agents the ability to interact with the real world. Without tools, an LLM can only generate text. With tools, it can search the web, query databases, write files, call APIs, execute code, and more.

Here's a minimal example using OpenAI's function calling API:

import openai
import json

# Define tools the agent can use
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name, e.g. 'San Francisco'"
                    },
                    "units": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature units"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

# The agent loop with tool calling
client = openai.OpenAI()
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto"
)

# The model decides to call get_weather
tool_call = response.choices[0].message.tool_calls[0]
print(f"Agent wants to call: {tool_call.function.name}")
print(f"With arguments: {tool_call.function.arguments}")
# Output: Agent wants to call: get_weather
# Output: With arguments: {"city": "Tokyo", "units": "celsius"}

# Execute the tool and feed results back
weather_result = {"temp": 22, "condition": "partly cloudy", "humidity": 65}
messages.append(response.choices[0].message)
messages.append({
    "role": "tool",
    "tool_call_id": tool_call.id,
    "content": json.dumps(weather_result)
})

# Agent generates final response using tool results
final = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)
print(final.choices[0].message.content)
# "The current weather in Tokyo is 22Β°C and partly cloudy with 65% humidity."

The model doesn't execute the function itself - it decides which function to call and with what arguments. Your code executes the function and feeds the result back. This separation is critical for security and control.

Planning (Task Decomposition)

Complex tasks require breaking a goal into sub-tasks. Agents use several planning strategies:

  • Sequential planning: Break the task into ordered steps and execute them one by one. Simple but brittle - if step 3 fails, the whole plan may need revision.
  • Tree-of-Thought (ToT): Explore multiple reasoning paths simultaneously, evaluate each, and select the most promising. Like a chess player thinking several moves ahead.
  • Hierarchical planning: Create a high-level plan, then decompose each step into sub-plans. A manager agent creates the plan; worker agents execute sub-tasks.
# Tree-of-Thought planning prompt
PLANNING_PROMPT = """
You are a planning agent. Given a complex task, generate 3 different
approaches to solve it. For each approach, evaluate:
1. Likelihood of success (1-10)
2. Number of steps required
3. Potential failure points

Task: {task}

Approach 1:
...
Approach 2:
...
Approach 3:
...

Selected approach: [choose the best one and explain why]
"""

Memory

Agents need memory to maintain context across interactions and learn from past experience. There are three types:

  • Short-term memory (context window): The conversation history within the current LLM context. Limited by token count (128K for GPT-4o, 200K for Claude). Fast but ephemeral.
  • Long-term memory (vector store): Persistent storage using embeddings. Past conversations, documents, and learned facts are embedded and retrieved via semantic search. Survives across sessions.
  • Episodic memory: Records of specific past interactions - what worked, what failed, what the user preferred. Enables the agent to improve over time and personalize behavior.

Self-Reflection and Error Correction

The best agents don't just execute - they evaluate their own output and correct mistakes. This is implemented as a reflection step in the agent loop:

def agent_with_reflection(task: str, max_retries: int = 3) -> str:
    """Agent loop with built-in self-reflection."""
    result = execute_task(task)

    for attempt in range(max_retries):
        # Self-reflection: evaluate the result
        evaluation = llm.generate(f"""
            Task: {task}
            Result: {result}

            Evaluate this result:
            1. Does it fully address the task? (yes/no)
            2. Are there any errors or inaccuracies?
            3. What could be improved?

            If the result is satisfactory, respond with "APPROVED".
            Otherwise, explain what needs to change.
        """)

        if "APPROVED" in evaluation:
            return result

        # Self-correction: fix the issues
        result = llm.generate(f"""
            Original task: {task}
            Previous result: {result}
            Issues found: {evaluation}

            Generate an improved result that addresses all issues.
        """)

    return result  # Return best effort after max retries

This pattern - generate, evaluate, refine - is what separates robust agents from fragile ones. Reflexion (Shinn et al., 2023) formalized this into a framework where agents maintain a memory of past failures to avoid repeating mistakes.

3. Agent Architectures

How you structure your agents determines what they can accomplish. There's no one-size-fits-all - the right architecture depends on task complexity, reliability requirements, and how much human oversight you need.

Single Agent

One LLM with access to a set of tools, running in a loop. This is the simplest architecture and works well for focused tasks: answering questions with RAG, executing a defined workflow, or interacting with a single API.

# Single agent architecture
class SimpleAgent:
    def __init__(self, model: str, tools: list):
        self.client = openai.OpenAI()
        self.model = model
        self.tools = tools
        self.messages = []

    def run(self, user_input: str) -> str:
        self.messages.append({"role": "user", "content": user_input})

        while True:
            response = self.client.chat.completions.create(
                model=self.model,
                messages=self.messages,
                tools=self.tools,
                tool_choice="auto"
            )
            msg = response.choices[0].message
            self.messages.append(msg)

            # If no tool calls, we have the final answer
            if not msg.tool_calls:
                return msg.content

            # Execute each tool call
            for tool_call in msg.tool_calls:
                result = self.execute_tool(
                    tool_call.function.name,
                    json.loads(tool_call.function.arguments)
                )
                self.messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": json.dumps(result)
                })

    def execute_tool(self, name: str, args: dict) -> dict:
        return self.tool_registry[name](**args)

Pros: Simple to build, debug, and reason about. Low latency. Cons: Limited by a single model's capabilities. Can struggle with tasks requiring diverse expertise.

Multi-Agent: Supervisor Pattern

A supervisor agent receives the task, breaks it into sub-tasks, delegates to specialized worker agents, and synthesizes their results. Think of it as a project manager coordinating a team.

# Supervisor pattern
class SupervisorAgent:
    def __init__(self):
        self.workers = {
            "researcher": ResearchAgent(),
            "coder": CodingAgent(),
            "reviewer": ReviewAgent(),
        }

    def run(self, task: str) -> str:
        # Supervisor plans the work
        plan = self.plan(task)

        results = {}
        for step in plan:
            worker = self.workers[step["agent"]]
            results[step["id"]] = worker.run(
                step["instruction"],
                context=results
            )

        # Supervisor synthesizes final output
        return self.synthesize(task, results)

Multi-Agent: Peer-to-Peer

Agents communicate directly with each other without a central coordinator. Each agent has a specific role and passes messages to relevant peers. This works well for debate-style reasoning or collaborative problem-solving where no single agent has authority.

Multi-Agent: Hierarchical

A tree structure where a top-level agent delegates to mid-level agents, which further delegate to specialized workers. This scales to very complex tasks but adds latency and coordination overhead.

Human-in-the-Loop

The agent pauses at critical decision points and asks for human approval before proceeding. Essential for high-stakes tasks (financial transactions, production deployments, customer communications) where full autonomy is too risky.

# Human-in-the-loop pattern
class HumanInTheLoopAgent:
    def __init__(self, tools, approval_required: list[str]):
        self.tools = tools
        self.approval_required = approval_required  # Tools needing approval

    async def run(self, task: str):
        while True:
            action = self.decide_next_action(task)

            if action.tool_name in self.approval_required:
                # Pause and ask for human approval
                approved = await self.request_approval(
                    tool=action.tool_name,
                    args=action.arguments,
                    reasoning=action.thought
                )
                if not approved:
                    self.messages.append({
                        "role": "system",
                        "content": "User rejected this action. Try a different approach."
                    })
                    continue

            result = self.execute_tool(action)
            if self.is_complete(result):
                return result

Architecture Comparison

Architecture Complexity Best For Latency Reliability Cost
Single Agent Low Focused tasks, simple workflows Low Medium Low
Supervisor Medium Multi-step tasks needing coordination Medium High Medium
Peer-to-Peer High Debate, collaborative reasoning High Medium High
Hierarchical Very High Complex enterprise workflows Very High High Very High
Human-in-the-Loop Medium High-stakes decisions, compliance Variable Very High Medium

4. Tool Use Deep Dive

Tool use is the single most important capability that separates agents from chatbots. Let's go deep on how to implement it properly - schemas, error handling, sandboxing, and a complete working example.

Tool Schemas

Every tool needs a clear schema that tells the LLM what the tool does, what parameters it accepts, and what it returns. The schema is your contract between the model and your code.

# Well-designed tool schemas
tools = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": (
                "Search the web for current information. Use this when you need "
                "up-to-date facts, news, documentation, or any information that "
                "might not be in your training data."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "The search query. Be specific and concise."
                    },
                    "num_results": {
                        "type": "integer",
                        "description": "Number of results to return (1-10)",
                        "default": 5
                    }
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": (
                "Write content to a file on disk. Creates the file if it doesn't "
                "exist, overwrites if it does. Use for saving code, reports, data."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {
                        "type": "string",
                        "description": "File path relative to workspace root"
                    },
                    "content": {
                        "type": "string",
                        "description": "The full content to write to the file"
                    }
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "run_python",
            "description": (
                "Execute Python code in a sandboxed environment. Use for "
                "calculations, data processing, or testing code snippets. "
                "Returns stdout, stderr, and exit code."
            ),
            "parameters": {
                "type": "object",
                "properties": {
                    "code": {
                        "type": "string",
                        "description": "Python code to execute"
                    },
                    "timeout": {
                        "type": "integer",
                        "description": "Max execution time in seconds",
                        "default": 30
                    }
                },
                "required": ["code"]
            }
        }
    }
]

Schema design tips: Write descriptions as if explaining to a junior developer. Include examples in descriptions for ambiguous parameters. Use enums to constrain choices. Mark parameters as required only when they truly are.

Error Handling

Tools fail. APIs time out, files don't exist, code throws exceptions. Robust agents handle errors gracefully and let the LLM decide how to recover:

def execute_tool_safely(name: str, args: dict) -> dict:
    """Execute a tool with comprehensive error handling."""
    try:
        result = tool_registry[name](**args)
        return {
            "status": "success",
            "result": result
        }
    except KeyError:
        return {
            "status": "error",
            "error": f"Unknown tool: {name}",
            "hint": f"Available tools: {list(tool_registry.keys())}"
        }
    except TypeError as e:
        return {
            "status": "error",
            "error": f"Invalid arguments: {e}",
            "hint": "Check parameter names and types"
        }
    except TimeoutError:
        return {
            "status": "error",
            "error": f"Tool '{name}' timed out after {args.get('timeout', 30)}s",
            "hint": "Try simplifying the request or increasing timeout"
        }
    except Exception as e:
        return {
            "status": "error",
            "error": f"{type(e).__name__}: {str(e)}",
            "hint": "An unexpected error occurred"
        }

By returning structured error information (not just raising exceptions), the LLM can reason about what went wrong and try a different approach.

Sandboxed Code Execution

Letting an LLM execute arbitrary code is powerful but dangerous. Always sandbox code execution:

import subprocess
import tempfile
import os

def run_python_sandboxed(code: str, timeout: int = 30) -> dict:
    """Execute Python code in an isolated subprocess."""
    with tempfile.NamedTemporaryFile(
        mode='w', suffix='.py', delete=False
    ) as f:
        f.write(code)
        f.flush()
        try:
            result = subprocess.run(
                ["python3", f.name],
                capture_output=True,
                text=True,
                timeout=timeout,
                env={
                    "PATH": os.environ.get("PATH", ""),
                    "HOME": "/tmp",
                    # Minimal env - no secrets, no credentials
                },
                cwd="/tmp"  # Isolated working directory
            )
            return {
                "stdout": result.stdout,
                "stderr": result.stderr,
                "exit_code": result.returncode
            }
        except subprocess.TimeoutExpired:
            return {
                "stdout": "",
                "stderr": f"Execution timed out after {timeout}s",
                "exit_code": -1
            }
        finally:
            os.unlink(f.name)

For production systems, use Docker containers or gVisor for stronger isolation. Never run agent-generated code with access to production credentials or databases.

Complete Example: Research Agent with Web Search and File Writing

Here's a fully working agent that can search the web and save results to files:

import openai
import json
import requests
from pathlib import Path

client = openai.OpenAI()

# --- Tool implementations ---
def search_web(query: str, num_results: int = 5) -> dict:
    """Search using SerpAPI (or any search API)."""
    resp = requests.get("https://serpapi.com/search", params={
        "q": query,
        "num": num_results,
        "api_key": os.environ["SERPAPI_KEY"]
    })
    data = resp.json()
    return {
        "results": [
            {"title": r["title"], "snippet": r["snippet"], "link": r["link"]}
            for r in data.get("organic_results", [])[:num_results]
        ]
    }

def write_file(path: str, content: str) -> dict:
    """Write content to a file in the workspace."""
    workspace = Path("./workspace")
    workspace.mkdir(exist_ok=True)
    file_path = workspace / path
    file_path.parent.mkdir(parents=True, exist_ok=True)
    file_path.write_text(content)
    return {"status": "written", "path": str(file_path), "bytes": len(content)}

def read_file(path: str) -> dict:
    """Read a file from the workspace."""
    file_path = Path("./workspace") / path
    if not file_path.exists():
        return {"error": f"File not found: {path}"}
    return {"content": file_path.read_text()}

# --- Tool registry ---
TOOL_REGISTRY = {
    "search_web": search_web,
    "write_file": write_file,
    "read_file": read_file,
}

# --- Tool schemas for OpenAI ---
TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "search_web",
            "description": "Search the web for current information.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "num_results": {"type": "integer", "default": 5}
                },
                "required": ["query"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "write_file",
            "description": "Write content to a file in the workspace.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path"},
                    "content": {"type": "string", "description": "File content"}
                },
                "required": ["path", "content"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "read_file",
            "description": "Read a file from the workspace.",
            "parameters": {
                "type": "object",
                "properties": {
                    "path": {"type": "string", "description": "File path"}
                },
                "required": ["path"]
            }
        }
    }
]

# --- The Agent ---
def research_agent(task: str, max_iterations: int = 10) -> str:
    """A research agent that searches the web and writes reports."""
    messages = [
        {
            "role": "system",
            "content": (
                "You are a research agent. You can search the web for information "
                "and write files to save your findings. When given a research task, "
                "search for relevant information, synthesize it, and write a "
                "comprehensive report to a markdown file. Always cite your sources."
            )
        },
        {"role": "user", "content": task}
    ]

    for i in range(max_iterations):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOLS,
            tool_choice="auto"
        )
        msg = response.choices[0].message
        messages.append(msg)

        # No tool calls = agent is done
        if not msg.tool_calls:
            return msg.content

        # Execute all tool calls
        for tool_call in msg.tool_calls:
            fn_name = tool_call.function.name
            fn_args = json.loads(tool_call.function.arguments)
            print(f"  πŸ”§ Calling {fn_name}({fn_args})")

            try:
                result = TOOL_REGISTRY[fn_name](**fn_args)
            except Exception as e:
                result = {"error": str(e)}

            messages.append({
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            })

    return "Max iterations reached without completion."

# --- Run it ---
if __name__ == "__main__":
    result = research_agent(
        "Research the current state of quantum computing in 2026. "
        "Write a report covering major breakthroughs, key players, "
        "and practical applications. Save it as quantum-report.md"
    )
    print(result)

This agent will autonomously search for quantum computing information, synthesize multiple sources, and write a structured markdown report - all without human intervention after the initial prompt.

5. Multi-Agent Systems

When a single agent isn't enough, you bring in a team. Multi-agent systems assign specialized roles to different agents and coordinate their work. Let's look at the three most popular frameworks.

CrewAI: Role-Based Agent Teams

CrewAI is the most intuitive multi-agent framework. You define agents with roles, goals, and backstories, then assign them tasks that form a crew.

from crewai import Agent, Task, Crew, Process
from crewai_tools import SerperDevTool, FileWriterTool

# Define specialized agents
researcher = Agent(
    role='Senior Research Analyst',
    goal='Find comprehensive, accurate information about the given topic',
    backstory=(
        'You are an expert research analyst with 15 years of experience. '
        'You excel at finding reliable sources, cross-referencing data, '
        'and identifying key trends and insights.'
    ),
    tools=[SerperDevTool()],
    verbose=True,
    allow_delegation=False
)

writer = Agent(
    role='Technical Content Writer',
    goal='Transform research findings into clear, engaging technical content',
    backstory=(
        'You are a skilled technical writer who can take complex research '
        'and turn it into accessible, well-structured articles. You always '
        'cite sources and use concrete examples.'
    ),
    tools=[FileWriterTool()],
    verbose=True,
    allow_delegation=False
)

editor = Agent(
    role='Senior Editor',
    goal='Ensure content is accurate, well-structured, and publication-ready',
    backstory=(
        'You are a meticulous editor with a keen eye for factual accuracy, '
        'logical flow, and clear writing. You catch errors others miss.'
    ),
    verbose=True,
    allow_delegation=True  # Can send back to writer for revisions
)

# Define tasks
research_task = Task(
    description=(
        'Research the current state of {topic}. Find at least 5 reliable '
        'sources. Cover: key developments, major players, challenges, '
        'and future outlook. Provide specific data points and statistics.'
    ),
    expected_output=(
        'A detailed research brief with organized findings, '
        'statistics, and source citations.'
    ),
    agent=researcher
)

writing_task = Task(
    description=(
        'Using the research findings, write a comprehensive 2000-word '
        'article about {topic}. Include an introduction, main sections '
        'with headers, code examples where relevant, and a conclusion. '
        'Cite all sources.'
    ),
    expected_output='A complete, well-structured article in markdown format.',
    agent=writer,
    context=[research_task]  # This task depends on research
)

editing_task = Task(
    description=(
        'Review the article for accuracy, clarity, and completeness. '
        'Check all facts against the research. Fix any issues. '
        'Ensure the article flows well and is ready for publication.'
    ),
    expected_output='A polished, publication-ready article in markdown.',
    agent=editor,
    context=[research_task, writing_task],
    output_file='output/article.md'  # Save final output
)

# Assemble the crew
crew = Crew(
    agents=[researcher, writer, editor],
    tasks=[research_task, writing_task, editing_task],
    process=Process.sequential,  # Tasks run in order
    verbose=True
)

# Run the crew
result = crew.kickoff(inputs={"topic": "AI agents in production"})
print(result)

CrewAI handles the orchestration - passing context between agents, managing retries, and collecting outputs. The Process.sequential mode runs tasks in order; Process.hierarchical adds a manager agent that coordinates dynamically.

AutoGen: Conversation-Based Multi-Agent

Microsoft's AutoGen models multi-agent interaction as conversations. Agents talk to each other, debate, and collaborate through message passing.

import autogen

# Configuration for the LLM
llm_config = {
    "model": "gpt-4o",
    "api_key": os.environ["OPENAI_API_KEY"],
    "temperature": 0.1
}

# Create agents
engineer = autogen.AssistantAgent(
    name="Engineer",
    system_message=(
        "You are a senior software engineer. Write clean, well-tested code. "
        "Always include error handling and type hints."
    ),
    llm_config=llm_config
)

critic = autogen.AssistantAgent(
    name="Critic",
    system_message=(
        "You are a code reviewer. Review code for bugs, security issues, "
        "performance problems, and style. Be thorough but constructive."
    ),
    llm_config=llm_config
)

# UserProxyAgent executes code and represents the human
user_proxy = autogen.UserProxyAgent(
    name="UserProxy",
    human_input_mode="NEVER",  # Fully autonomous
    max_consecutive_auto_reply=5,
    code_execution_config={
        "work_dir": "workspace",
        "use_docker": True  # Sandboxed execution
    }
)

# Create a group chat
group_chat = autogen.GroupChat(
    agents=[user_proxy, engineer, critic],
    messages=[],
    max_round=12
)

manager = autogen.GroupChatManager(
    groupchat=group_chat,
    llm_config=llm_config
)

# Start the conversation
user_proxy.initiate_chat(
    manager,
    message=(
        "Build a Python CLI tool that fetches GitHub repository stats "
        "(stars, forks, issues) and displays them in a formatted table. "
        "Include error handling for invalid repos and rate limiting."
    )
)

The Engineer writes code, the Critic reviews it, and the UserProxy executes it. They iterate until the code works correctly - mimicking a real development team's workflow.

LangGraph: State Machine Multi-Agent

LangGraph models agent workflows as directed graphs with explicit state. This gives you fine-grained control over routing, branching, and error handling.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator

# Define the shared state
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    task: str
    research: str
    draft: str
    review: str
    final: str
    iteration: int

# Define agent nodes
def researcher_node(state: AgentState) -> dict:
    """Research agent gathers information."""
    response = llm.invoke(
        f"Research this topic thoroughly: {state['task']}\n"
        f"Previous context: {state.get('review', 'None')}"
    )
    return {"research": response.content, "messages": [response]}

def writer_node(state: AgentState) -> dict:
    """Writer agent creates content from research."""
    response = llm.invoke(
        f"Write a detailed article based on this research:\n"
        f"{state['research']}\n\n"
        f"Previous feedback: {state.get('review', 'None')}"
    )
    return {"draft": response.content, "messages": [response]}

def reviewer_node(state: AgentState) -> dict:
    """Reviewer agent evaluates the draft."""
    response = llm.invoke(
        f"Review this article for accuracy and quality:\n"
        f"{state['draft']}\n\n"
        f"Research it was based on:\n{state['research']}\n\n"
        f"Respond with APPROVED if ready, or provide specific feedback."
    )
    return {
        "review": response.content,
        "iteration": state.get("iteration", 0) + 1,
        "messages": [response]
    }

# Routing logic
def should_continue(state: AgentState) -> Literal["writer", "end"]:
    """Decide whether to revise or finish."""
    if "APPROVED" in state["review"]:
        return "end"
    if state.get("iteration", 0) >= 3:
        return "end"  # Max revisions reached
    return "writer"  # Send back for revision

# Build the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("researcher", researcher_node)
workflow.add_node("writer", writer_node)
workflow.add_node("reviewer", reviewer_node)

# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges("reviewer", should_continue, {
    "writer": "writer",  # Loop back for revision
    "end": END
})

# Compile and run
app = workflow.compile()
result = app.invoke({
    "task": "Explain quantum error correction",
    "messages": [],
    "iteration": 0
})

LangGraph's explicit state machine gives you deterministic control over agent routing. You can see exactly which path the workflow took, add checkpoints for human review, and handle errors at specific nodes. It's more verbose than CrewAI but far more controllable in production.

6. Memory Systems

Memory is what transforms a stateless LLM into a persistent, learning agent. Without memory, every interaction starts from scratch. With memory, agents remember past conversations, learn user preferences, and build knowledge over time.

Conversation Buffer Memory

The simplest form - store the entire conversation history and pass it to the LLM on every call. Works well for short conversations but hits context window limits fast.

class ConversationBufferMemory:
    """Store full conversation history."""

    def __init__(self, max_tokens: int = 128000):
        self.messages: list[dict] = []
        self.max_tokens = max_tokens

    def add(self, role: str, content: str):
        self.messages.append({"role": role, "content": content})
        self._trim()

    def _trim(self):
        """Remove oldest messages if we exceed token limit."""
        while self._estimate_tokens() > self.max_tokens:
            # Keep system message, remove oldest user/assistant pair
            if len(self.messages) > 2:
                self.messages.pop(1)

    def get_messages(self) -> list[dict]:
        return self.messages.copy()

    def _estimate_tokens(self) -> int:
        return sum(len(m["content"]) // 4 for m in self.messages)

Summary Memory

Instead of storing every message, periodically summarize the conversation and keep only the summary plus recent messages. This dramatically reduces token usage while preserving important context.

class SummaryMemory:
    """Maintain a running summary + recent messages."""

    def __init__(self, llm_client, recent_count: int = 10):
        self.llm = llm_client
        self.summary = ""
        self.recent: list[dict] = []
        self.recent_count = recent_count

    def add(self, role: str, content: str):
        self.recent.append({"role": role, "content": content})

        # When recent messages exceed threshold, summarize older ones
        if len(self.recent) > self.recent_count * 2:
            old_messages = self.recent[:self.recent_count]
            self.recent = self.recent[self.recent_count:]
            self._update_summary(old_messages)

    def _update_summary(self, messages: list[dict]):
        """Use LLM to update the running summary."""
        msg_text = "\n".join(
            f"{m['role']}: {m['content']}" for m in messages
        )
        response = self.llm.chat.completions.create(
            model="gpt-4o-mini",  # Use cheap model for summarization
            messages=[{
                "role": "user",
                "content": (
                    f"Current summary:\n{self.summary}\n\n"
                    f"New messages:\n{msg_text}\n\n"
                    "Update the summary to include key information from "
                    "the new messages. Be concise but preserve important "
                    "details, decisions, and user preferences."
                )
            }]
        )
        self.summary = response.choices[0].message.content

    def get_messages(self) -> list[dict]:
        messages = []
        if self.summary:
            messages.append({
                "role": "system",
                "content": f"Conversation summary so far:\n{self.summary}"
            })
        messages.extend(self.recent)
        return messages

Vector Store Memory (with ChromaDB)

For true long-term memory that persists across sessions, embed conversation snippets and facts into a vector database. Retrieve relevant memories via semantic search when needed.

import chromadb
from openai import OpenAI
from datetime import datetime
import uuid

class VectorMemory:
    """Long-term memory using ChromaDB for semantic retrieval."""

    def __init__(self, collection_name: str = "agent_memory"):
        self.chroma = chromadb.PersistentClient(path="./memory_db")
        self.collection = self.chroma.get_or_create_collection(
            name=collection_name,
            metadata={"hnsw:space": "cosine"}
        )
        self.openai = OpenAI()

    def store(self, content: str, metadata: dict = None):
        """Store a memory with embedding."""
        embedding = self._embed(content)
        self.collection.add(
            ids=[str(uuid.uuid4())],
            embeddings=[embedding],
            documents=[content],
            metadatas=[{
                "timestamp": datetime.now().isoformat(),
                "type": "conversation",
                **(metadata or {})
            }]
        )

    def recall(self, query: str, n_results: int = 5) -> list[str]:
        """Retrieve relevant memories via semantic search."""
        embedding = self._embed(query)
        results = self.collection.query(
            query_embeddings=[embedding],
            n_results=n_results
        )
        return results["documents"][0] if results["documents"] else []

    def _embed(self, text: str) -> list[float]:
        """Generate embedding using OpenAI."""
        response = self.openai.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding

# --- Using vector memory in an agent ---
class MemoryAugmentedAgent:
    """Agent with both short-term and long-term memory."""

    def __init__(self):
        self.client = OpenAI()
        self.short_term = ConversationBufferMemory()
        self.long_term = VectorMemory()

    def chat(self, user_input: str) -> str:
        # Retrieve relevant long-term memories
        memories = self.long_term.recall(user_input, n_results=3)
        memory_context = ""
        if memories:
            memory_context = (
                "Relevant information from past conversations:\n"
                + "\n".join(f"- {m}" for m in memories)
            )

        # Build messages with both memory types
        messages = [
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant with long-term memory. "
                    "Use past conversation context when relevant.\n\n"
                    f"{memory_context}"
                )
            }
        ]
        messages.extend(self.short_term.get_messages())
        messages.append({"role": "user", "content": user_input})

        # Generate response
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages
        )
        reply = response.choices[0].message.content

        # Store in both memory systems
        self.short_term.add("user", user_input)
        self.short_term.add("assistant", reply)
        self.long_term.store(
            f"User asked: {user_input}\nAssistant replied: {reply}",
            metadata={"type": "conversation"}
        )

        return reply

Entity Memory

Track specific entities (people, projects, concepts) mentioned in conversations. Maintain a structured knowledge graph that the agent can query.

class EntityMemory:
    """Track entities and their attributes across conversations."""

    def __init__(self):
        self.entities: dict[str, dict] = {}
        self.llm = OpenAI()

    def extract_and_update(self, message: str):
        """Use LLM to extract entities from a message."""
        response = self.llm.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{
                "role": "user",
                "content": (
                    f"Extract entities and their attributes from this message. "
                    f"Return JSON: {{\"entity_name\": {{\"attribute\": \"value\"}}}}.\n\n"
                    f"Message: {message}\n\n"
                    f"Known entities: {json.dumps(self.entities)}"
                )
            }],
            response_format={"type": "json_object"}
        )
        new_entities = json.loads(response.choices[0].message.content)

        # Merge with existing entities
        for name, attrs in new_entities.items():
            if name in self.entities:
                self.entities[name].update(attrs)
            else:
                self.entities[name] = attrs

    def get_context(self, query: str) -> str:
        """Get relevant entity context for a query."""
        if not self.entities:
            return ""
        return (
            "Known entities:\n"
            + "\n".join(
                f"- {name}: {json.dumps(attrs)}"
                for name, attrs in self.entities.items()
            )
        )

In practice, production agents combine all four memory types: buffer for the current conversation, summary for session history, vector store for cross-session recall, and entity memory for structured knowledge.

7. Planning & Reasoning

Planning is what separates a tool-calling chatbot from a true agent. Without planning, agents take greedy actions - doing whatever seems best right now. With planning, they think ahead, decompose complex goals, and execute structured strategies.

Chain-of-Thought (CoT)

The simplest reasoning strategy: prompt the model to think step-by-step before acting. This dramatically improves accuracy on complex tasks.

COT_SYSTEM_PROMPT = """
You are a problem-solving agent. Before taking any action, always:

1. STATE the problem clearly
2. IDENTIFY what information you have and what you need
3. PLAN your approach step by step
4. EXECUTE one step at a time
5. VERIFY each result before moving to the next step

Think through your reasoning explicitly. Show your work.
"""

CoT works because it forces the model to allocate compute to reasoning rather than jumping to conclusions. It's free (just a prompt change) and should be your default for any non-trivial agent task.

Tree-of-Thought (ToT)

For problems with multiple possible approaches, explore several reasoning paths in parallel and evaluate which is most promising before committing.

from concurrent.futures import ThreadPoolExecutor

def tree_of_thought(problem: str, num_branches: int = 3) -> str:
    """Explore multiple reasoning paths and select the best."""
    client = OpenAI()

    # Step 1: Generate multiple approaches
    branches = []
    for i in range(num_branches):
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=[{
                "role": "user",
                "content": (
                    f"Problem: {problem}\n\n"
                    f"Generate approach #{i+1} (be creative, try a different "
                    f"angle than obvious solutions).\n"
                    f"Outline your reasoning step by step."
                )
            }],
            temperature=0.8  # Higher temp for diversity
        )
        branches.append(response.choices[0].message.content)

    # Step 2: Evaluate each approach
    evaluation = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": (
                f"Problem: {problem}\n\n"
                f"Here are {num_branches} approaches:\n\n"
                + "\n\n---\n\n".join(
                    f"Approach {i+1}:\n{b}"
                    for i, b in enumerate(branches)
                )
                + "\n\nEvaluate each approach on:\n"
                "1. Correctness (will it solve the problem?)\n"
                "2. Efficiency (is it the simplest path?)\n"
                "3. Robustness (will it handle edge cases?)\n\n"
                "Select the best approach and explain why."
            )
        }],
        temperature=0.1  # Low temp for evaluation
    )

    return evaluation.choices[0].message.content

Plan-and-Execute with LangGraph

The most powerful planning pattern: first create a complete plan, then execute each step, re-planning as needed based on results. This is how the most capable agents (like Devin) operate.

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator

class PlanExecuteState(TypedDict):
    task: str
    plan: list[str]          # List of steps
    current_step: int
    step_results: Annotated[list[str], operator.add]
    final_answer: str

def planner_node(state: PlanExecuteState) -> dict:
    """Create or revise the plan."""
    context = ""
    if state.get("step_results"):
        context = (
            "Steps completed so far:\n"
            + "\n".join(
                f"  Step {i+1}: {r}"
                for i, r in enumerate(state["step_results"])
            )
            + "\n\nRevise the remaining plan if needed based on these results."
        )

    response = llm.invoke(
        f"Task: {state['task']}\n\n"
        f"{context}\n\n"
        f"Create a step-by-step plan to complete this task. "
        f"Each step should be a single, concrete action. "
        f"Return the plan as a numbered list."
    )

    # Parse the plan into a list of steps
    steps = [
        line.strip().lstrip("0123456789.)")
        for line in response.content.strip().split("\n")
        if line.strip() and line.strip()[0].isdigit()
    ]

    return {"plan": steps, "current_step": 0}

def executor_node(state: PlanExecuteState) -> dict:
    """Execute the current step of the plan."""
    step_idx = state["current_step"]
    step = state["plan"][step_idx]

    context = ""
    if state.get("step_results"):
        context = "Previous results:\n" + "\n".join(state["step_results"])

    response = agent_with_tools.invoke(
        f"Execute this step: {step}\n\n"
        f"Overall task: {state['task']}\n"
        f"{context}"
    )

    return {
        "step_results": [f"{step} β†’ {response.content}"],
        "current_step": step_idx + 1
    }

def should_replan(state: PlanExecuteState) -> Literal["replan", "execute", "end"]:
    """Decide whether to continue, replan, or finish."""
    if state["current_step"] >= len(state["plan"]):
        return "end"

    # Replan every 3 steps to adapt to new information
    if state["current_step"] > 0 and state["current_step"] % 3 == 0:
        return "replan"

    return "execute"

def finalizer_node(state: PlanExecuteState) -> dict:
    """Synthesize final answer from all step results."""
    response = llm.invoke(
        f"Task: {state['task']}\n\n"
        f"All step results:\n"
        + "\n".join(state["step_results"])
        + "\n\nSynthesize a final, comprehensive answer."
    )
    return {"final_answer": response.content}

# Build the plan-and-execute graph
workflow = StateGraph(PlanExecuteState)

workflow.add_node("planner", planner_node)
workflow.add_node("executor", executor_node)
workflow.add_node("finalizer", finalizer_node)

workflow.set_entry_point("planner")
workflow.add_edge("planner", "executor")
workflow.add_conditional_edges("executor", should_replan, {
    "execute": "executor",
    "replan": "planner",
    "end": "finalizer"
})
workflow.add_edge("finalizer", END)

plan_execute_agent = workflow.compile()

# Run it
result = plan_execute_agent.invoke({
    "task": "Analyze the top 5 Python web frameworks, compare their "
            "performance benchmarks, and recommend the best one for "
            "a high-traffic REST API.",
    "plan": [],
    "current_step": 0,
    "step_results": [],
    "final_answer": ""
})

The plan-and-execute pattern is powerful because it separates what to do from how to do it. The planner thinks strategically; the executor handles tactical details. Re-planning after every few steps lets the agent adapt when reality doesn't match expectations.

8. Production Considerations

Building a demo agent is easy. Running one in production is hard. Here's everything you need to think about before deploying agents to real users.

Rate Limiting & Cost Control

Agents can make dozens of LLM calls per task. Without controls, a single runaway agent can burn through your API budget in minutes.

import time
from dataclasses import dataclass, field

@dataclass
class AgentBudget:
    """Track and limit agent resource usage."""
    max_llm_calls: int = 50
    max_tool_calls: int = 30
    max_tokens: int = 500_000
    max_cost_usd: float = 5.00
    max_runtime_seconds: int = 300

    # Tracking
    llm_calls: int = field(default=0, init=False)
    tool_calls: int = field(default=0, init=False)
    tokens_used: int = field(default=0, init=False)
    cost_usd: float = field(default=0.0, init=False)
    start_time: float = field(default_factory=time.time, init=False)

    def check_budget(self, action: str = "llm_call"):
        """Raise if any budget limit is exceeded."""
        elapsed = time.time() - self.start_time

        if elapsed > self.max_runtime_seconds:
            raise BudgetExceeded(f"Runtime exceeded: {elapsed:.0f}s")
        if self.llm_calls >= self.max_llm_calls:
            raise BudgetExceeded(f"LLM call limit: {self.llm_calls}")
        if self.tool_calls >= self.max_tool_calls:
            raise BudgetExceeded(f"Tool call limit: {self.tool_calls}")
        if self.cost_usd >= self.max_cost_usd:
            raise BudgetExceeded(f"Cost limit: ${self.cost_usd:.2f}")

    def record_llm_call(self, input_tokens: int, output_tokens: int):
        self.llm_calls += 1
        self.tokens_used += input_tokens + output_tokens
        # GPT-4o pricing (approximate)
        self.cost_usd += (input_tokens * 2.50 + output_tokens * 10.00) / 1_000_000

    def record_tool_call(self):
        self.tool_calls += 1

class BudgetExceeded(Exception):
    pass

Error Handling & Retries

LLM APIs fail. Tools fail. The agent's reasoning fails. Build resilience at every layer:

import tenacity

@tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    wait=tenacity.wait_exponential(multiplier=1, min=2, max=30),
    retry=tenacity.retry_if_exception_type(
        (openai.RateLimitError, openai.APITimeoutError)
    ),
    before_sleep=lambda retry_state: print(
        f"Retrying in {retry_state.next_action.sleep}s..."
    )
)
def call_llm_with_retry(client, **kwargs):
    """LLM call with exponential backoff retry."""
    return client.chat.completions.create(**kwargs)

# Timeout wrapper for tool execution
import signal

class ToolTimeout:
    def __init__(self, seconds: int = 30):
        self.seconds = seconds

    def __enter__(self):
        signal.signal(signal.SIGALRM, self._handler)
        signal.alarm(self.seconds)
        return self

    def __exit__(self, *args):
        signal.alarm(0)

    def _handler(self, signum, frame):
        raise TimeoutError(f"Tool execution timed out after {self.seconds}s")

Observability

You can't debug what you can't see. Instrument your agents with tracing and logging from day one.

# LangSmith integration for tracing
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-langsmith-key"
os.environ["LANGCHAIN_PROJECT"] = "my-agent-prod"

# Custom structured logging
import structlog

logger = structlog.get_logger()

def traced_agent_step(step_name: str, func):
    """Decorator for tracing agent steps."""
    def wrapper(*args, **kwargs):
        logger.info("agent_step_start",
            step=step_name,
            args=str(args)[:200]
        )
        try:
            result = func(*args, **kwargs)
            logger.info("agent_step_complete",
                step=step_name,
                result_preview=str(result)[:200]
            )
            return result
        except Exception as e:
            logger.error("agent_step_failed",
                step=step_name,
                error=str(e),
                error_type=type(e).__name__
            )
            raise
    return wrapper

# Weights & Biases integration
import wandb

wandb.init(project="agent-monitoring")

def log_agent_run(task: str, result: str, budget: AgentBudget):
    """Log agent run metrics to W&B."""
    wandb.log({
        "task_length": len(task),
        "result_length": len(result),
        "llm_calls": budget.llm_calls,
        "tool_calls": budget.tool_calls,
        "tokens_used": budget.tokens_used,
        "cost_usd": budget.cost_usd,
        "runtime_seconds": time.time() - budget.start_time,
    })

Guardrails & Safety

Agents can go off the rails. Implement guardrails to prevent harmful actions:

class AgentGuardrails:
    """Safety checks for agent actions."""

    BLOCKED_TOOLS_IN_PROD = {"delete_database", "rm_rf", "send_email_blast"}
    MAX_FILE_WRITE_SIZE = 1_000_000  # 1MB
    ALLOWED_DOMAINS = {"api.github.com", "serpapi.com", "api.openai.com"}

    @classmethod
    def check_tool_call(cls, tool_name: str, args: dict) -> tuple[bool, str]:
        """Validate a tool call before execution."""
        if tool_name in cls.BLOCKED_TOOLS_IN_PROD:
            return False, f"Tool '{tool_name}' is blocked in production"

        if tool_name == "write_file":
            if len(args.get("content", "")) > cls.MAX_FILE_WRITE_SIZE:
                return False, "File content exceeds size limit"
            if ".." in args.get("path", ""):
                return False, "Path traversal detected"

        if tool_name == "http_request":
            from urllib.parse import urlparse
            domain = urlparse(args.get("url", "")).netloc
            if domain not in cls.ALLOWED_DOMAINS:
                return False, f"Domain '{domain}' not in allowlist"

        return True, "OK"

    @classmethod
    def check_output(cls, output: str) -> tuple[bool, str]:
        """Check agent output for sensitive content."""
        # Check for PII patterns (simplified)
        import re
        if re.search(r'\b\d{3}-\d{2}-\d{4}\b', output):
            return False, "Output contains potential SSN"
        if re.search(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b', output):
            return False, "Output contains email address"
        return True, "OK"

Production Checklist

βœ… Agent Production Readiness Checklist:

  • Budget limits: Max LLM calls, token limits, cost caps, and runtime timeouts per agent run
  • Retry logic: Exponential backoff for API calls, circuit breakers for persistent failures
  • Tool sandboxing: Code execution in Docker/gVisor, file access restricted to workspace, network allowlists
  • Input validation: Sanitize user inputs, validate tool arguments before execution
  • Output guardrails: Check for PII, harmful content, and hallucinated actions before returning to user
  • Observability: Structured logging, distributed tracing (LangSmith/Datadog), metrics dashboards
  • Human escalation: Clear paths to escalate to humans when confidence is low or stakes are high
  • Graceful degradation: Fallback behavior when LLM APIs are down or rate-limited
  • Testing: Unit tests for tools, integration tests for agent loops, eval suites for reasoning quality
  • Versioning: Pin model versions, version your prompts, track tool schema changes
  • Audit trail: Log every LLM call, tool execution, and decision for debugging and compliance
  • Cost monitoring: Real-time cost tracking with alerts when spending exceeds thresholds

9. Real-World Agent Examples

Let's look at how these patterns come together in real-world agent architectures.

Coding Agent (Devin-Style)

A coding agent that can understand requirements, write code, run tests, and iterate until the code works.

Architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Planner    β”‚ ← Breaks issue into coding tasks
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Coder     │────▢│   Executor   β”‚ ← Runs code in sandbox
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                    β”‚
       β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Reviewer   │◀────│ Test Runner  β”‚ ← Runs test suite
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
  Tests pass? ──No──▢ Back to Coder with feedback
       β”‚
      Yes ──▢ Create PR

Key tools: File read/write, shell execution (sandboxed), git operations, test runner, linter. Memory: Codebase index in vector store, conversation history for context. Planning: Plan-and-execute with re-planning after test failures.

# Simplified coding agent core loop
class CodingAgent:
    tools = ["read_file", "write_file", "run_shell", "run_tests",
             "search_codebase", "git_commit"]

    def solve_issue(self, issue: str) -> str:
        # 1. Understand the codebase
        relevant_files = self.search_codebase(issue)
        context = self.read_files(relevant_files)

        # 2. Plan the changes
        plan = self.planner.create_plan(issue, context)

        # 3. Implement each step
        for step in plan:
            self.coder.implement(step, context)
            test_result = self.run_tests()

            if not test_result.passed:
                # Self-correction loop
                for retry in range(3):
                    feedback = self.reviewer.analyze_failure(
                        test_result, step
                    )
                    self.coder.fix(feedback)
                    test_result = self.run_tests()
                    if test_result.passed:
                        break

        # 4. Final review and PR
        self.git_commit(plan.summary)
        return self.create_pull_request(issue, plan)

Research Agent

An agent that can research any topic by searching the web, reading papers, and synthesizing findings into structured reports.

Architecture:

User Query
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Query       β”‚ ← Decomposes into sub-questions
β”‚  Decomposer  β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Web Search  β”‚     β”‚  Paper Searchβ”‚ ← Parallel search
β”‚  Agent       β”‚     β”‚  Agent       β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                    β”‚
       β–Ό                    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Synthesizer Agent        β”‚ ← Combines findings
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Fact-Checker Agent          β”‚ ← Verifies claims
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
               β–Ό
         Final Report

Key tools: Web search, academic paper search (Semantic Scholar API), PDF reader, note-taking (file write). Memory: Vector store of all sources found, entity memory for key facts. Planning: Query decomposition with parallel execution of sub-queries.

Data Analysis Agent

An agent that connects to databases, writes SQL/Python, generates visualizations, and explains findings in natural language.

Architecture:

User Question (natural language)
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Schema      β”‚ ← Reads DB schema, understands tables
β”‚  Analyzer    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  SQL Writer  β”‚ ← Generates and validates SQL
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Executor    β”‚ ← Runs query, handles errors
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Visualizer  β”‚ ← Generates charts with matplotlib/plotly
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Narrator    β”‚ ← Explains findings in plain English
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key tools: SQL execution (read-only!), Python execution for analysis, chart generation, file export. Safety: Read-only database access, query validation, result size limits. Memory: Schema cache, past query history for optimization.

Customer Support Agent

An agent that handles customer inquiries by searching knowledge bases, looking up account information, and escalating to humans when needed.

Architecture:

Customer Message
    β”‚
    β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Classifier  β”‚ ← Intent detection + urgency scoring
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€ Billing ──▢ Billing Agent (account lookup, refunds)
       β”œβ”€β”€ Technical ──▢ Tech Agent (KB search, troubleshooting)
       β”œβ”€β”€ Sales ──▢ Sales Agent (product info, pricing)
       └── Complex/Angry ──▢ Human Escalation
                              β”‚
                              β–Ό
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  Human Agent β”‚ ← With full context summary
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key tools: Knowledge base search (RAG), CRM lookup, order management API, ticket creation. Guardrails: Cannot issue refunds over $X without approval, must escalate angry customers, PII redaction in logs. Memory: Full conversation history, customer entity memory (past issues, preferences, account tier).

What's Next

Agentic AI is moving fast. The frameworks are maturing, the models are getting better at tool use and reasoning, and production patterns are emerging. Here's where to go from here:

  • Start simple: Build a single agent with 2-3 tools. Get the loop working before adding complexity.
  • Add memory: Even basic conversation buffer memory makes agents dramatically more useful.
  • Instrument everything: You'll need observability from day one. LangSmith is the easiest starting point.
  • Test with evals: Build evaluation datasets for your specific use case. Vibes-based testing doesn't scale.
  • Go multi-agent carefully: Only add agents when a single agent genuinely can't handle the task. More agents = more complexity = more failure modes.

The agents that win in production aren't the most sophisticated - they're the most reliable. Focus on error handling, guardrails, and graceful degradation before adding fancy planning or multi-agent orchestration.

Want More?

Check out our hands-on tutorials for step-by-step agent builds, or explore our AI tools comparison to pick the right foundation models and frameworks for your agents.