By Connie · Last reviewed: April 2026 — pricing & tools verified · AI-assisted, human-edited · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.

TutorialApril 5, 2026 · 13 min read

OpenAI Agents SDK Guide 2026: Build Autonomous AI Agents

The OpenAI Agents SDK is OpenAI's official framework for building production multi-agent systems. Released in March 2025 (replacing the experimental Swarm library), it provides clean abstractions for agents, tools, handoffs, guardrails, and tracing. This guide covers everything you need to go from zero to production.

TL;DR

• Install: pip install openai-agents
• Core concepts: Agent → Tools → Handoffs → Guardrails → Tracing
• Best for: OpenAI-native workflows needing clean multi-agent orchestration
• vs LangGraph: simpler and more opinionated; less flexible for complex graphs
• Tracing is built-in and outputs to OpenAI dashboard or custom backends
• Use Runner.run() for async, Runner.run_sync() for sync contexts

What Is the OpenAI Agents SDK?

The Agents SDK provides five core primitives for building agentic systems:

Primitive	What It Is	Use Case
Agent	An LLM + system prompt + list of tools	The basic unit of work; each agent has a focused role
Tool	A Python function the agent can call	Web search, database query, API calls, file I/O
Handoff	Transfer control from one agent to another	Routing (triage agent → specialist agent)
Guardrail	Async validation on inputs or outputs	Safety filtering, format validation, cost controls
Runner	Executes the agent loop; manages state	Entry point for all agent runs; async or sync

Installation and Setup

pip install openai-agents

# Set your API key
export OPENAI_API_KEY="sk-..."

# Verify installation
python -c "from agents import Agent, Runner; print('OK')"

The SDK requires Python 3.10+ and openai>=1.65.0. It works with any OpenAI-compatible endpoint.

Your First Agent: Hello World

from agents import Agent, Runner

# Create a simple agent
agent = Agent(
    name="assistant",
    instructions="You are a helpful assistant. Be concise and accurate.",
    model="gpt-5.4",
)

# Run it synchronously (for scripts / notebooks)
result = Runner.run_sync(agent, "What is the capital of France?")
print(result.final_output)
# → "Paris"

# Run it asynchronously (for FastAPI / async apps)
import asyncio

async def main():
    result = await Runner.run(agent, "What is the capital of France?")
    print(result.final_output)

asyncio.run(main())

Adding Tools

Tools are Python functions decorated with @function_tool. The SDK auto-generates the JSON schema from the function signature and docstring:

from agents import Agent, Runner, function_tool
import httpx

@function_tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    # In production, call a real weather API
    return f"It is 72°F and sunny in {city}."

@function_tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    # Use your preferred search API
    response = httpx.get(
        "https://api.search.example.com",
        params={"q": query, "limit": 3},
    )
    results = response.json()["results"]
    return "\n".join(r["snippet"] for r in results)

agent = Agent(
    name="research_assistant",
    instructions="You help users find information. Use tools to get accurate, current data.",
    tools=[get_weather, search_web],
    model="gpt-5.4",
)

result = Runner.run_sync(agent, "What's the weather in Tokyo and what's trending there today?")
print(result.final_output)

The agent decides which tools to call, in what order, and when to stop. You don't need to manage the tool-call loop manually — the SDK handles it.

Handoffs: Multi-Agent Routing

Handoffs let a triage agent route conversations to specialist agents. This is the SDK's signature feature and the cleanest implementation of the "specialist crew" pattern:

from agents import Agent, Runner, handoff

# Specialist agents
billing_agent = Agent(
    name="billing_specialist",
    instructions="""You handle billing questions: invoices, refunds,
    subscription changes, payment failures. Be empathetic and solution-focused.""",
    model="gpt-5.4",
)

technical_agent = Agent(
    name="technical_specialist",
    instructions="""You handle technical support: bugs, API errors,
    integration issues. Ask for error messages and reproduce steps.""",
    model="gpt-5.4",
)

sales_agent = Agent(
    name="sales_specialist",
    instructions="""You handle sales inquiries: pricing, demos,
    enterprise plans, procurement. Be consultative, not pushy.""",
    model="gpt-5.4",
)

# Triage agent routes to specialists
triage_agent = Agent(
    name="triage",
    instructions="""You are a customer service triage agent.
    Route the conversation to the right specialist:
    - Billing issues → billing_specialist
    - Technical problems → technical_specialist
    - Pricing/sales questions → sales_specialist
    Greet the customer first, then hand off immediately.""",
    handoffs=[billing_agent, technical_agent, sales_agent],
    model="gpt-5.4-mini",  # Use cheaper model for routing
)

result = Runner.run_sync(
    triage_agent,
    "My payment keeps failing with error code 402"
)
print(result.final_output)
# The triage agent routes to billing_specialist, who handles it

Key insight: use a cheap, fast model (GPT-5.4 mini) for the triage/routing agent and reserve the expensive model for the specialists who do the actual work.

Guardrails: Safety and Validation

Guardrails are async functions that run in parallel with the agent (not sequentially), so they add minimal latency:

from agents import Agent, Runner, GuardrailFunctionOutput, input_guardrail
from agents import RunContextWrapper
from pydantic import BaseModel

class TopicCheck(BaseModel):
    is_on_topic: bool
    reason: str

# Input guardrail: block off-topic requests
@input_guardrail
async def check_topic(
    ctx: RunContextWrapper, agent: Agent, input: str
) -> GuardrailFunctionOutput:
    """Ensure the input is about customer service, not general chat."""
    checker = Agent(
        name="topic_checker",
        instructions="Determine if the input is a customer service question.",
        output_type=TopicCheck,
        model="gpt-5.4-mini",
    )
    result = await Runner.run(checker, input, context=ctx.context)
    check = result.final_output_as(TopicCheck)

    return GuardrailFunctionOutput(
        output_info=check,
        tripwire_triggered=not check.is_on_topic,  # Block if off-topic
    )

# Attach guardrail to agent
agent = Agent(
    name="customer_service",
    instructions="You are a customer service agent for Acme Corp.",
    input_guardrails=[check_topic],
    model="gpt-5.4",
)

# This will be blocked by the guardrail
try:
    result = Runner.run_sync(agent, "Write me a haiku about mountains")
except Exception as e:
    print(f"Guardrail triggered: {e}")
    # → Guardrail triggered: Input guardrail check_topic tripped

Structured Outputs

Use Pydantic models with the output_type parameter to get typed, validated output instead of raw strings:

from agents import Agent, Runner
from pydantic import BaseModel
from typing import List

class CalendarEvent(BaseModel):
    title: str
    date: str          # ISO 8601
    duration_minutes: int
    attendees: List[str]
    location: str | None = None
    notes: str | None = None

extraction_agent = Agent(
    name="calendar_extractor",
    instructions="""Extract calendar events from the text.
    Return all events as structured data.""",
    output_type=CalendarEvent,
    model="gpt-5.4",
)

email_text = """
Hi team, let's meet Thursday April 10 at 2pm for 45 minutes
to review Q2 planning. Room 3B. Attendees: Sarah, Tom, Alex.
"""

result = Runner.run_sync(extraction_agent, email_text)
event = result.final_output_as(CalendarEvent)
print(event.title)      # "Q2 Planning Review"
print(event.date)       # "2026-04-10"
print(event.attendees)  # ["Sarah", "Tom", "Alex"]

Tracing and Observability

The SDK includes built-in tracing. Every run automatically records agent steps, tool calls, handoffs, and LLM inputs/outputs:

from agents import Agent, Runner, trace

# Traces automatically sent to OpenAI dashboard (openai.com/traces)
# To disable: set OPENAI_AGENTS_DISABLE_TRACING=1

# Custom trace context for grouping related runs
with trace("customer_support_session", metadata={"user_id": "usr_123"}):
    result1 = await Runner.run(triage_agent, "I have a billing question")
    result2 = await Runner.run(triage_agent, "One more thing about my invoice")
    # Both runs appear as one trace in the dashboard

# Access trace data programmatically
from agents.tracing import get_current_trace
print(get_current_trace().trace_id)

# Custom tracing backend (e.g., LangSmith, Datadog)
from agents.tracing import set_tracing_export_api
set_tracing_export_api(your_custom_backend)

Complete Production Example: Research Pipeline

Here's a realistic multi-agent pipeline that handles research requests end-to-end:

from agents import Agent, Runner, function_tool, handoff
from pydantic import BaseModel
from typing import List
import asyncio

# --- Tools ---
@function_tool
def web_search(query: str) -> str:
    """Search the web for recent information on a topic."""
    # Implement with your preferred search API
    return f"[Search results for: {query}]"

@function_tool
def extract_key_facts(text: str) -> str:
    """Extract the most important facts from a piece of text."""
    return f"[Key facts extracted from text]"

# --- Output schema ---
class ResearchReport(BaseModel):
    title: str
    summary: str
    key_findings: List[str]
    sources: List[str]
    confidence: str  # "high" | "medium" | "low"

# --- Specialist agents ---
search_agent = Agent(
    name="searcher",
    instructions="Search the web for comprehensive information on the given topic. Gather multiple sources.",
    tools=[web_search],
    model="gpt-5.4",
)

analysis_agent = Agent(
    name="analyst",
    instructions="Analyze the research materials provided. Extract key findings and assess confidence.",
    tools=[extract_key_facts],
    model="gpt-5.4",
)

writer_agent = Agent(
    name="writer",
    instructions="Write a clear, well-structured research report based on the analysis provided.",
    output_type=ResearchReport,
    model="gpt-5.4",
)

# --- Orchestrator ---
research_orchestrator = Agent(
    name="research_orchestrator",
    instructions="""You orchestrate research requests:
    1. Hand off to searcher to gather sources
    2. Hand off to analyst to analyze findings
    3. Hand off to writer to produce the final report
    Always complete all three steps.""",
    handoffs=[search_agent, analysis_agent, writer_agent],
    model="gpt-5.4-mini",
)

async def run_research(topic: str) -> ResearchReport:
    result = await Runner.run(
        research_orchestrator,
        f"Research this topic thoroughly: {topic}"
    )
    return result.final_output_as(ResearchReport)

# Usage
report = asyncio.run(run_research("Impact of AI agents on software developer productivity in 2026"))
print(report.title)
print(report.summary)
for finding in report.key_findings:
    print(f"• {finding}")

OpenAI Agents SDK vs Other Frameworks

Framework	Best For	Weaknesses	Learning Curve
OpenAI Agents SDK	OpenAI-native multi-agent workflows; production apps	Opinionated; optimized for OpenAI models	Low
LangGraph	Complex stateful graphs; multi-provider; advanced routing	Verbose; steep learning curve	High
CrewAI	Role-based teams; non-technical setup; quick prototypes	Less control over execution flow	Low
AutoGen (v0.4+)	Conversation-based agents; research experiments	Still maturing; complex async patterns	Medium
Anthropic SDK (direct)	Pure Claude agents; maximum control; no framework overhead	Manual loop management; no handoffs built-in	Medium
Pydantic AI	Type-safe agents; testability; dependency injection	Newer; smaller ecosystem	Medium

Decision Matrix: When to Use OpenAI Agents SDK

Situation	Use OpenAI Agents SDK?	Why
You're fully on OpenAI models	Yes	Best DX; native tracing; optimized performance
You need complex state machines	Maybe	Consider LangGraph for conditional branching beyond simple handoffs
You want structured outputs	Yes	Best-in-class Pydantic integration; validated types
You need multi-LLM routing	No	Use LangGraph or a custom router
You want the simplest possible setup	Yes	3 lines to a working agent; minimal boilerplate
You're building a research prototype	Yes	CrewAI is faster for demos; Agents SDK is cleaner for production
Enterprise compliance required	Yes	Built-in tracing; OpenAI's audit logs; HIPAA/SOC2 via Azure OpenAI

Common Patterns and Best Practices

Separate concerns with specialist agents. Each agent should have one job. A triage agent routes; it doesn't answer. A billing agent handles billing; it doesn't debug code. Separation makes agents easier to improve and test independently.
Use cheaper models for routing. Handoff decisions are simple classifications — GPT-5.4 mini costs 10x less than GPT-5.4 and makes the same routing decisions. Reserve expensive models for the actual task.
Keep system prompts focused. Agents with narrow, specific instructions outperform agents with long, complex prompts. If your instructions exceed 500 words, split the agent.
Test guardrails independently. Guardrails run as separate agents. Test them in isolation before attaching them to your main agent. A slow guardrail negates its own performance benefit.
Use tracing in development. The OpenAI tracing dashboard shows exactly what each agent did, which tools it called, and why. Debugging without it is guesswork.
Implement graceful handoff failures. If a specialist agent fails, the orchestrator should have fallback instructions — not leave the user hanging with an error.

Cost Estimation

Architecture	Tokens per Run (est.)	Cost per 1,000 Runs
Single agent, no tools	~2,000 tokens	~$0.30
Single agent + 3 tool calls	~8,000 tokens	~$1.20
Triage + 1 specialist (handoff)	~12,000 tokens	~$1.80
3-agent pipeline (orchestrator + 2 specialists)	~25,000 tokens	~$3.75
Complex 5-agent research pipeline	~80,000 tokens	~$12.00

Based on GPT-5.4 pricing at $15/1M input + $60/1M output tokens. Using GPT-5.4-mini for routing agents reduces costs by 60–70%.

FAQ

What is the OpenAI Agents SDK?

The OpenAI Agents SDK is an official Python framework for building multi-agent AI systems. It provides primitives for agents (LLM + instructions + tools), handoffs (transferring control between agents), guardrails (input/output validation), and tracing (observability). It was released in March 2025 and replaced the earlier Swarm experimental library.

How does the OpenAI Agents SDK compare to LangGraph?

OpenAI Agents SDK is simpler and more opinionated — great for OpenAI-native workflows with minimal boilerplate. LangGraph is more flexible and supports any LLM, complex state machines, and multi-modal graphs. Choose OpenAI Agents SDK if you're fully on OpenAI models; choose LangGraph for complex routing logic or multi-provider setups.

Can I use the OpenAI Agents SDK with non-OpenAI models?

Yes — the SDK supports any OpenAI-compatible API endpoint, which includes models hosted on Azure OpenAI, Together AI, Groq, and providers using the OpenAI API format. However, the SDK is optimized for GPT-5.4 and GPT-4o; some features (like structured outputs) work best with OpenAI's own models.

What are guardrails in the OpenAI Agents SDK?

Guardrails are validation functions that run on agent inputs or outputs. Input guardrails can reject harmful or off-topic requests before the agent processes them. Output guardrails can validate or transform agent responses before they reach the user. They run as async functions alongside the agent, not sequentially, so they don't add latency.

Want Agents Without the SDK Overhead?

Happycapy gives you production-ready AI agents — writing, research, coding, automation — without managing infrastructure or stitching frameworks together.

Try Happycapy Free →

Sources

OpenAI OpenAI GPT-4 Anthropic Anthropic Claude

← Back to all articles

SharePost on X LinkedIn

—Was this helpful?

Get the best AI tools tips — weekly

Honest reviews, tutorials, and Happycapy tips. No spam.

Tutorial

Happycapy + Claude Opus 4.7: Build Agentic Workflows That Actually Ship (2026)

7 min

Tutorial