By Connie · Last reviewed: April 2026 — pricing & tools verified · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.
OpenAI Launches Safety Bug Bounty: Pays Up to $100K for AI Agent Vulnerabilities
March 25, 2026 · 7 min read · Happycapy Guide
OpenAI launched a Safety Bug Bounty program on March 25, 2026 — separate from its existing Security Bug Bounty. It pays up to $7,500 for high-severity public reports and up to $100,000 for critical findings in private campaigns. Targets: prompt injection, agentic manipulation (ChatGPT Agent, Atlas, Codex, Operator), proprietary data leaks, and account integrity bypasses. Hosted on Bugcrowd. Jailbreaks that only produce rude text are explicitly out of scope.
OpenAI has launched a dedicated Safety Bug Bounty program — its first public program specifically targeting AI-native abuse and safety risks. The launch follows a surge in agentic AI deployments and comes three weeks after Anthropic's Claude Code source code was leaked via npm sourcemaps in April 2026, raising industry-wide awareness of AI security gaps.
The new program covers attack scenarios that fall into a gap between traditional security vulnerabilities and general policy violations. When a prompt injection attack hijacks a ChatGPT Agent to exfiltrate user data, that is not a SQL injection. It is a new category of harm — and until now, it had no dedicated reward structure.
What the Safety Bug Bounty Covers
The program targets four primary vulnerability classes, all specific to AI agent behavior:
| Vulnerability Type | Description | Reproducibility Threshold |
|---|---|---|
| Prompt Injection / Data Exfiltration | Text inputs that reliably hijack a victim's agent to perform harmful actions or leak sensitive data | Must reproduce ≥50% of attempts |
| Agentic Disallowed Actions at Scale | Manipulating ChatGPT Agent, Atlas Browser, Codex, or Operator into performing prohibited actions systematically | Must demonstrate at-scale behavior |
| Proprietary Information Leakage | Model outputs that reveal internal reasoning processes, training details, or proprietary OpenAI information | Must contain non-public information |
| Account / Platform Integrity | Bypassing anti-automation controls, manipulating trust signals, evading account restrictions | Clear path to harmful outcome required |
Notably out of scope: jailbreaks that only produce rude language or information that is easily searchable. OpenAI explicitly wants reports of AI-specific harm, not demonstrations that the model can swear.
Reward Structure
| Tier | Max Reward | Requirements |
|---|---|---|
| Public High Severity | $7,500 | Consistently reproducible, clear mitigation steps included |
| Case-by-Case (Direct Harm) | Negotiated | Direct path to user harm + actionable remediation |
| Private Campaign (Critical) | $100,000 | Invitation only; focuses on biorisk, novel agentic vectors, GPT-5 private preview |
The $100,000 ceiling applies to private campaigns OpenAI runs for specific high-risk areas. Public submissions are capped at $7,500 for high severity — still competitive with standard bug bounty programs at most tech companies.
Try Happycapy — Run Claude, GPT, Gemini and Grok in One Secure PlatformWhy OpenAI Launched This Now
Three forces converged to make this launch urgent in March 2026:
- Agentic AI proliferation: ChatGPT Agent, Atlas Browser, Codex CLI, and Operator are now used by millions of enterprise customers. Each agent has browser access, code execution, and file system permissions — creating attack surfaces that did not exist in 2024.
- Rising prompt injection incidents: Researchers at OWASP documented a 340% increase in reported prompt injection attacks in Q1 2026, driven by the deployment of agents that execute actions rather than just generate text.
- Competitive pressure: Anthropic has had a private red-teaming program since 2024, and Google DeepMind launched its AI safety vulnerability disclosure program in January 2026. OpenAI needed a public-facing equivalent.
Safety Bug Bounty vs Security Bug Bounty: The Difference
OpenAI now runs two parallel programs, both hosted on Bugcrowd. Understanding the boundary between them matters for researchers:
| Program | Covers | Example Report |
|---|---|---|
| Security Bug Bounty | Traditional system vulnerabilities: SQLi, auth bypass, SSRF, access control | Unauthenticated API endpoint exposes user data |
| Safety Bug Bounty | AI-specific abuse: prompt injection, agentic manipulation, model behavior exploits | Crafted document causes Atlas Browser agent to exfiltrate session tokens |
OpenAI's triage team will reroute reports between programs automatically if a researcher submits to the wrong one. Researchers should default to the Safety Bug Bounty for anything involving model or agent behavior.
Implications for AI Users and Developers
For enterprise AI users, this program signals that the attack surface of AI products is now being treated with the same rigor as traditional software vulnerabilities. Agentic AI is not inherently safe just because it runs in a sandbox — it has persistent access to files, APIs, and browser sessions, and it can be manipulated through data it ingests.
For developers building on OpenAI's APIs, the program highlights three practical security considerations:
- Untrusted input handling: Any agent that ingests external content — emails, documents, web pages — is a prompt injection target. Treat all external text as untrusted input, not trusted instructions.
- Minimal permissions: Give agents only the permissions they need for a specific task. An agent that summarizes emails should not have file system write access.
- Output validation: Validate agent outputs before they trigger real-world actions. A confirmation step before any irreversible action (send email, delete file, API call) prevents most agentic attack chains.
Frequently Asked Questions
What is OpenAI's Safety Bug Bounty program?
OpenAI's Safety Bug Bounty program, launched March 25, 2026, pays security researchers to find AI-specific vulnerabilities. This includes prompt injection attacks, agentic manipulation, data exfiltration, and account integrity bypasses. It is separate from the existing Security Bug Bounty and is hosted on Bugcrowd.
How much does OpenAI pay for safety bug reports?
OpenAI pays up to $7,500 for high-severity public submissions. For critical findings in private campaigns — such as biorisk content issues or novel agentic attack vectors in GPT-5 — rewards can reach $100,000. Severity is determined by reproducibility, harm potential, and whether the flaw has a direct path to user harm.
What AI vulnerabilities does the Safety Bug Bounty cover?
The program covers: prompt injection and data exfiltration (reproducible ≥50%), disallowed agentic actions by ChatGPT Agent, Atlas Browser, Codex, or Operator at scale, model outputs that reveal proprietary OpenAI information, and account integrity bypasses.
How is this different from OpenAI's Security Bug Bounty?
The Security Bug Bounty covers traditional system vulnerabilities (SQL injection, auth bypass, SSRF). The Safety Bug Bounty covers AI-specific harms where the model or agent itself is manipulated, even when no underlying system vulnerability exists. Both are hosted on Bugcrowd and reports can be rerouted automatically.
Get the best AI tools tips — weekly
Honest reviews, tutorials, and Happycapy tips. No spam.