By Connie · Last reviewed: April 2026 — pricing & tools verified · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.
OpenAI Agents SDK Guide 2026: Build Autonomous AI Agents
The OpenAI Agents SDK is OpenAI's official framework for building production multi-agent systems. Released in March 2025 (replacing the experimental Swarm library), it provides clean abstractions for agents, tools, handoffs, guardrails, and tracing. This guide covers everything you need to go from zero to production.
TL;DR
- • Install:
pip install openai-agents - • Core concepts: Agent → Tools → Handoffs → Guardrails → Tracing
- • Best for: OpenAI-native workflows needing clean multi-agent orchestration
- • vs LangGraph: simpler and more opinionated; less flexible for complex graphs
- • Tracing is built-in and outputs to OpenAI dashboard or custom backends
- • Use
Runner.run()for async,Runner.run_sync()for sync contexts
What Is the OpenAI Agents SDK?
The Agents SDK provides five core primitives for building agentic systems:
| Primitive | What It Is | Use Case |
|---|---|---|
| Agent | An LLM + system prompt + list of tools | The basic unit of work; each agent has a focused role |
| Tool | A Python function the agent can call | Web search, database query, API calls, file I/O |
| Handoff | Transfer control from one agent to another | Routing (triage agent → specialist agent) |
| Guardrail | Async validation on inputs or outputs | Safety filtering, format validation, cost controls |
| Runner | Executes the agent loop; manages state | Entry point for all agent runs; async or sync |
Installation and Setup
pip install openai-agents
# Set your API key
export OPENAI_API_KEY="sk-..."
# Verify installation
python -c "from agents import Agent, Runner; print('OK')"
The SDK requires Python 3.10+ and openai>=1.65.0. It works with any OpenAI-compatible endpoint.
Your First Agent: Hello World
from agents import Agent, Runner
# Create a simple agent
agent = Agent(
name="assistant",
instructions="You are a helpful assistant. Be concise and accurate.",
model="gpt-5.4",
)
# Run it synchronously (for scripts / notebooks)
result = Runner.run_sync(agent, "What is the capital of France?")
print(result.final_output)
# → "Paris"
# Run it asynchronously (for FastAPI / async apps)
import asyncio
async def main():
result = await Runner.run(agent, "What is the capital of France?")
print(result.final_output)
asyncio.run(main())
Adding Tools
Tools are Python functions decorated with @function_tool. The SDK auto-generates the JSON schema from the function signature and docstring:
from agents import Agent, Runner, function_tool
import httpx
@function_tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
# In production, call a real weather API
return f"It is 72°F and sunny in {city}."
@function_tool
def search_web(query: str) -> str:
"""Search the web for current information."""
# Use your preferred search API
response = httpx.get(
"https://api.search.example.com",
params={"q": query, "limit": 3},
)
results = response.json()["results"]
return "\n".join(r["snippet"] for r in results)
agent = Agent(
name="research_assistant",
instructions="You help users find information. Use tools to get accurate, current data.",
tools=[get_weather, search_web],
model="gpt-5.4",
)
result = Runner.run_sync(agent, "What's the weather in Tokyo and what's trending there today?")
print(result.final_output)
The agent decides which tools to call, in what order, and when to stop. You don't need to manage the tool-call loop manually — the SDK handles it.
Handoffs: Multi-Agent Routing
Handoffs let a triage agent route conversations to specialist agents. This is the SDK's signature feature and the cleanest implementation of the "specialist crew" pattern:
from agents import Agent, Runner, handoff
# Specialist agents
billing_agent = Agent(
name="billing_specialist",
instructions="""You handle billing questions: invoices, refunds,
subscription changes, payment failures. Be empathetic and solution-focused.""",
model="gpt-5.4",
)
technical_agent = Agent(
name="technical_specialist",
instructions="""You handle technical support: bugs, API errors,
integration issues. Ask for error messages and reproduce steps.""",
model="gpt-5.4",
)
sales_agent = Agent(
name="sales_specialist",
instructions="""You handle sales inquiries: pricing, demos,
enterprise plans, procurement. Be consultative, not pushy.""",
model="gpt-5.4",
)
# Triage agent routes to specialists
triage_agent = Agent(
name="triage",
instructions="""You are a customer service triage agent.
Route the conversation to the right specialist:
- Billing issues → billing_specialist
- Technical problems → technical_specialist
- Pricing/sales questions → sales_specialist
Greet the customer first, then hand off immediately.""",
handoffs=[billing_agent, technical_agent, sales_agent],
model="gpt-5.4-mini", # Use cheaper model for routing
)
result = Runner.run_sync(
triage_agent,
"My payment keeps failing with error code 402"
)
print(result.final_output)
# The triage agent routes to billing_specialist, who handles it
Key insight: use a cheap, fast model (GPT-5.4 mini) for the triage/routing agent and reserve the expensive model for the specialists who do the actual work.
Guardrails: Safety and Validation
Guardrails are async functions that run in parallel with the agent (not sequentially), so they add minimal latency:
from agents import Agent, Runner, GuardrailFunctionOutput, input_guardrail
from agents import RunContextWrapper
from pydantic import BaseModel
class TopicCheck(BaseModel):
is_on_topic: bool
reason: str
# Input guardrail: block off-topic requests
@input_guardrail
async def check_topic(
ctx: RunContextWrapper, agent: Agent, input: str
) -> GuardrailFunctionOutput:
"""Ensure the input is about customer service, not general chat."""
checker = Agent(
name="topic_checker",
instructions="Determine if the input is a customer service question.",
output_type=TopicCheck,
model="gpt-5.4-mini",
)
result = await Runner.run(checker, input, context=ctx.context)
check = result.final_output_as(TopicCheck)
return GuardrailFunctionOutput(
output_info=check,
tripwire_triggered=not check.is_on_topic, # Block if off-topic
)
# Attach guardrail to agent
agent = Agent(
name="customer_service",
instructions="You are a customer service agent for Acme Corp.",
input_guardrails=[check_topic],
model="gpt-5.4",
)
# This will be blocked by the guardrail
try:
result = Runner.run_sync(agent, "Write me a haiku about mountains")
except Exception as e:
print(f"Guardrail triggered: {e}")
# → Guardrail triggered: Input guardrail check_topic tripped
Structured Outputs
Use Pydantic models with the output_type parameter to get typed, validated output instead of raw strings:
from agents import Agent, Runner
from pydantic import BaseModel
from typing import List
class CalendarEvent(BaseModel):
title: str
date: str # ISO 8601
duration_minutes: int
attendees: List[str]
location: str | None = None
notes: str | None = None
extraction_agent = Agent(
name="calendar_extractor",
instructions="""Extract calendar events from the text.
Return all events as structured data.""",
output_type=CalendarEvent,
model="gpt-5.4",
)
email_text = """
Hi team, let's meet Thursday April 10 at 2pm for 45 minutes
to review Q2 planning. Room 3B. Attendees: Sarah, Tom, Alex.
"""
result = Runner.run_sync(extraction_agent, email_text)
event = result.final_output_as(CalendarEvent)
print(event.title) # "Q2 Planning Review"
print(event.date) # "2026-04-10"
print(event.attendees) # ["Sarah", "Tom", "Alex"]
Tracing and Observability
The SDK includes built-in tracing. Every run automatically records agent steps, tool calls, handoffs, and LLM inputs/outputs:
from agents import Agent, Runner, trace
# Traces automatically sent to OpenAI dashboard (openai.com/traces)
# To disable: set OPENAI_AGENTS_DISABLE_TRACING=1
# Custom trace context for grouping related runs
with trace("customer_support_session", metadata={"user_id": "usr_123"}):
result1 = await Runner.run(triage_agent, "I have a billing question")
result2 = await Runner.run(triage_agent, "One more thing about my invoice")
# Both runs appear as one trace in the dashboard
# Access trace data programmatically
from agents.tracing import get_current_trace
print(get_current_trace().trace_id)
# Custom tracing backend (e.g., LangSmith, Datadog)
from agents.tracing import set_tracing_export_api
set_tracing_export_api(your_custom_backend)
Complete Production Example: Research Pipeline
Here's a realistic multi-agent pipeline that handles research requests end-to-end:
from agents import Agent, Runner, function_tool, handoff
from pydantic import BaseModel
from typing import List
import asyncio
# --- Tools ---
@function_tool
def web_search(query: str) -> str:
"""Search the web for recent information on a topic."""
# Implement with your preferred search API
return f"[Search results for: {query}]"
@function_tool
def extract_key_facts(text: str) -> str:
"""Extract the most important facts from a piece of text."""
return f"[Key facts extracted from text]"
# --- Output schema ---
class ResearchReport(BaseModel):
title: str
summary: str
key_findings: List[str]
sources: List[str]
confidence: str # "high" | "medium" | "low"
# --- Specialist agents ---
search_agent = Agent(
name="searcher",
instructions="Search the web for comprehensive information on the given topic. Gather multiple sources.",
tools=[web_search],
model="gpt-5.4",
)
analysis_agent = Agent(
name="analyst",
instructions="Analyze the research materials provided. Extract key findings and assess confidence.",
tools=[extract_key_facts],
model="gpt-5.4",
)
writer_agent = Agent(
name="writer",
instructions="Write a clear, well-structured research report based on the analysis provided.",
output_type=ResearchReport,
model="gpt-5.4",
)
# --- Orchestrator ---
research_orchestrator = Agent(
name="research_orchestrator",
instructions="""You orchestrate research requests:
1. Hand off to searcher to gather sources
2. Hand off to analyst to analyze findings
3. Hand off to writer to produce the final report
Always complete all three steps.""",
handoffs=[search_agent, analysis_agent, writer_agent],
model="gpt-5.4-mini",
)
async def run_research(topic: str) -> ResearchReport:
result = await Runner.run(
research_orchestrator,
f"Research this topic thoroughly: {topic}"
)
return result.final_output_as(ResearchReport)
# Usage
report = asyncio.run(run_research("Impact of AI agents on software developer productivity in 2026"))
print(report.title)
print(report.summary)
for finding in report.key_findings:
print(f"• {finding}")
OpenAI Agents SDK vs Other Frameworks
| Framework | Best For | Weaknesses | Learning Curve |
|---|---|---|---|
| OpenAI Agents SDK | OpenAI-native multi-agent workflows; production apps | Opinionated; optimized for OpenAI models | Low |
| LangGraph | Complex stateful graphs; multi-provider; advanced routing | Verbose; steep learning curve | High |
| CrewAI | Role-based teams; non-technical setup; quick prototypes | Less control over execution flow | Low |
| AutoGen (v0.4+) | Conversation-based agents; research experiments | Still maturing; complex async patterns | Medium |
| Anthropic SDK (direct) | Pure Claude agents; maximum control; no framework overhead | Manual loop management; no handoffs built-in | Medium |
| Pydantic AI | Type-safe agents; testability; dependency injection | Newer; smaller ecosystem | Medium |
Decision Matrix: When to Use OpenAI Agents SDK
| Situation | Use OpenAI Agents SDK? | Why |
|---|---|---|
| You're fully on OpenAI models | Yes | Best DX; native tracing; optimized performance |
| You need complex state machines | Maybe | Consider LangGraph for conditional branching beyond simple handoffs |
| You want structured outputs | Yes | Best-in-class Pydantic integration; validated types |
| You need multi-LLM routing | No | Use LangGraph or a custom router |
| You want the simplest possible setup | Yes | 3 lines to a working agent; minimal boilerplate |
| You're building a research prototype | Yes | CrewAI is faster for demos; Agents SDK is cleaner for production |
| Enterprise compliance required | Yes | Built-in tracing; OpenAI's audit logs; HIPAA/SOC2 via Azure OpenAI |
Common Patterns and Best Practices
- Separate concerns with specialist agents. Each agent should have one job. A triage agent routes; it doesn't answer. A billing agent handles billing; it doesn't debug code. Separation makes agents easier to improve and test independently.
- Use cheaper models for routing. Handoff decisions are simple classifications — GPT-5.4 mini costs 10x less than GPT-5.4 and makes the same routing decisions. Reserve expensive models for the actual task.
- Keep system prompts focused. Agents with narrow, specific instructions outperform agents with long, complex prompts. If your instructions exceed 500 words, split the agent.
- Test guardrails independently. Guardrails run as separate agents. Test them in isolation before attaching them to your main agent. A slow guardrail negates its own performance benefit.
- Use tracing in development. The OpenAI tracing dashboard shows exactly what each agent did, which tools it called, and why. Debugging without it is guesswork.
- Implement graceful handoff failures. If a specialist agent fails, the orchestrator should have fallback instructions — not leave the user hanging with an error.
Cost Estimation
| Architecture | Tokens per Run (est.) | Cost per 1,000 Runs |
|---|---|---|
| Single agent, no tools | ~2,000 tokens | ~$0.30 |
| Single agent + 3 tool calls | ~8,000 tokens | ~$1.20 |
| Triage + 1 specialist (handoff) | ~12,000 tokens | ~$1.80 |
| 3-agent pipeline (orchestrator + 2 specialists) | ~25,000 tokens | ~$3.75 |
| Complex 5-agent research pipeline | ~80,000 tokens | ~$12.00 |
Based on GPT-5.4 pricing at $15/1M input + $60/1M output tokens. Using GPT-5.4-mini for routing agents reduces costs by 60–70%.
FAQ
What is the OpenAI Agents SDK?
The OpenAI Agents SDK is an official Python framework for building multi-agent AI systems. It provides primitives for agents (LLM + instructions + tools), handoffs (transferring control between agents), guardrails (input/output validation), and tracing (observability). It was released in March 2025 and replaced the earlier Swarm experimental library.
How does the OpenAI Agents SDK compare to LangGraph?
OpenAI Agents SDK is simpler and more opinionated — great for OpenAI-native workflows with minimal boilerplate. LangGraph is more flexible and supports any LLM, complex state machines, and multi-modal graphs. Choose OpenAI Agents SDK if you're fully on OpenAI models; choose LangGraph for complex routing logic or multi-provider setups.
Can I use the OpenAI Agents SDK with non-OpenAI models?
Yes — the SDK supports any OpenAI-compatible API endpoint, which includes models hosted on Azure OpenAI, Together AI, Groq, and providers using the OpenAI API format. However, the SDK is optimized for GPT-5.4 and GPT-4o; some features (like structured outputs) work best with OpenAI's own models.
What are guardrails in the OpenAI Agents SDK?
Guardrails are validation functions that run on agent inputs or outputs. Input guardrails can reject harmful or off-topic requests before the agent processes them. Output guardrails can validate or transform agent responses before they reach the user. They run as async functions alongside the agent, not sequentially, so they don't add latency.
Want Agents Without the SDK Overhead?
HappyCapy gives you production-ready AI agents — writing, research, coding, automation — without managing infrastructure or stitching frameworks together.
Try HappyCapy Free →Get the best AI tools tips — weekly
Honest reviews, tutorials, and Happycapy tips. No spam.