How is OpenAI's Safety Bug Bounty different from its Security Bug Bounty?

The Security Bug Bounty covers traditional vulnerabilities like SQL injection, authentication bypasses, and system exploits. The Safety Bug Bounty covers AI-specific harms — scenarios where the AI model itself is manipulated or misused to cause harm, even when no underlying system vulnerability exists. Both are hosted on Bugcrowd and reports can be rerouted between programs.

By Connie · Last reviewed: April 2026 — pricing & tools verified · AI-assisted, human-edited · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.

AI Security

OpenAI Launches Safety Bug Bounty: Pays Up to $100K for AI Agent Vulnerabilities

Q: What AI vulnerabilities does the OpenAI Safety Bug Bounty cover?

The program covers: (1) third-party prompt injection and data exfiltration where text reliably hijacks an agent, (2) disallowed actions performed by ChatGPT Agent, Atlas Browser, Codex, or Operator at scale, (3) model outputs that reveal proprietary OpenAI reasoning processes, and (4) account integrity bypasses including anti-automation evasion.

March 25, 2026 · 7 min read · Happycapy Guide

TL;DR

OpenAI launched a Safety Bug Bounty program on March 25, 2026 — separate from its existing Security Bug Bounty. It pays up to $7,500 for high-severity public reports and up to $100,000 for critical findings in private campaigns. Targets: prompt injection, agentic manipulation (ChatGPT Agent, Atlas, Codex, Operator), proprietary data leaks, and account integrity bypasses. Hosted on Bugcrowd. Jailbreaks that only produce rude text are explicitly out of scope.

OpenAI has launched a dedicated Safety Bug Bounty program — its first public program specifically targeting AI-native abuse and safety risks. The launch follows a surge in agentic AI deployments and comes three weeks after Anthropic's Claude Code source code was leaked via npm sourcemaps in April 2026, raising industry-wide awareness of AI security gaps.

The new program covers attack scenarios that fall into a gap between traditional security vulnerabilities and general policy violations. When a prompt injection attack hijacks a ChatGPT Agent to exfiltrate user data, that is not a SQL injection. It is a new category of harm — and until now, it had no dedicated reward structure.

What the Safety Bug Bounty Covers

The program targets four primary vulnerability classes, all specific to AI agent behavior:

Vulnerability Type	Description	Reproducibility Threshold
Prompt Injection / Data Exfiltration	Text inputs that reliably hijack a victim's agent to perform harmful actions or leak sensitive data	Must reproduce ≥50% of attempts
Agentic Disallowed Actions at Scale	Manipulating ChatGPT Agent, Atlas Browser, Codex, or Operator into performing prohibited actions systematically	Must demonstrate at-scale behavior
Proprietary Information Leakage	Model outputs that reveal internal reasoning processes, training details, or proprietary OpenAI information	Must contain non-public information
Account / Platform Integrity	Bypassing anti-automation controls, manipulating trust signals, evading account restrictions	Clear path to harmful outcome required

Notably out of scope: jailbreaks that only produce rude language or information that is easily searchable. OpenAI explicitly wants reports of AI-specific harm, not demonstrations that the model can swear.

Reward Structure

Tier	Max Reward	Requirements
Public High Severity	$7,500	Consistently reproducible, clear mitigation steps included
Case-by-Case (Direct Harm)	Negotiated	Direct path to user harm + actionable remediation
Private Campaign (Critical)	$100,000	Invitation only; focuses on biorisk, novel agentic vectors, GPT-5 private preview

The $100,000 ceiling applies to private campaigns OpenAI runs for specific high-risk areas. Public submissions are capped at $7,500 for high severity — still competitive with standard bug bounty programs at most tech companies.

Try Happycapy — Run Claude, GPT, Gemini and Grok in One Secure Platform

Why OpenAI Launched This Now

Three forces converged to make this launch urgent in March 2026:

Agentic AI proliferation: ChatGPT Agent, Atlas Browser, Codex CLI, and Operator are now used by millions of enterprise customers. Each agent has browser access, code execution, and file system permissions — creating attack surfaces that did not exist in 2024.
Rising prompt injection incidents: Researchers at OWASP documented a 340% increase in reported prompt injection attacks in Q1 2026, driven by the deployment of agents that execute actions rather than just generate text.
Competitive pressure: Anthropic has had a private red-teaming program since 2024, and Google DeepMind launched its AI safety vulnerability disclosure program in January 2026. OpenAI needed a public-facing equivalent.

Safety Bug Bounty vs Security Bug Bounty: The Difference

OpenAI now runs two parallel programs, both hosted on Bugcrowd. Understanding the boundary between them matters for researchers:

Program	Covers	Example Report
Security Bug Bounty	Traditional system vulnerabilities: SQLi, auth bypass, SSRF, access control	Unauthenticated API endpoint exposes user data
Safety Bug Bounty	AI-specific abuse: prompt injection, agentic manipulation, model behavior exploits	Crafted document causes Atlas Browser agent to exfiltrate session tokens

OpenAI's triage team will reroute reports between programs automatically if a researcher submits to the wrong one. Researchers should default to the Safety Bug Bounty for anything involving model or agent behavior.

Implications for AI Users and Developers

For enterprise AI users, this program signals that the attack surface of AI products is now being treated with the same rigor as traditional software vulnerabilities. Agentic AI is not inherently safe just because it runs in a sandbox — it has persistent access to files, APIs, and browser sessions, and it can be manipulated through data it ingests.

For developers building on OpenAI's APIs, the program highlights three practical security considerations:

Untrusted input handling: Any agent that ingests external content — emails, documents, web pages — is a prompt injection target. Treat all external text as untrusted input, not trusted instructions.
Minimal permissions: Give agents only the permissions they need for a specific task. An agent that summarizes emails should not have file system write access.
Output validation: Validate agent outputs before they trigger real-world actions. A confirmation step before any irreversible action (send email, delete file, API call) prevents most agentic attack chains.

Happycapy Pro — Multi-Model AI at $17/mo — Access Claude, GPT-5.4, Gemini 3.1 and Grok

Frequently Asked Questions

What is OpenAI's Safety Bug Bounty program?

OpenAI's Safety Bug Bounty program, launched March 25, 2026, pays security researchers to find AI-specific vulnerabilities. This includes prompt injection attacks, agentic manipulation, data exfiltration, and account integrity bypasses. It is separate from the existing Security Bug Bounty and is hosted on Bugcrowd.

How much does OpenAI pay for safety bug reports?

OpenAI pays up to $7,500 for high-severity public submissions. For critical findings in private campaigns — such as biorisk content issues or novel agentic attack vectors in GPT-5 — rewards can reach $100,000. Severity is determined by reproducibility, harm potential, and whether the flaw has a direct path to user harm.

What AI vulnerabilities does the Safety Bug Bounty cover?

The program covers: prompt injection and data exfiltration (reproducible ≥50%), disallowed agentic actions by ChatGPT Agent, Atlas Browser, Codex, or Operator at scale, model outputs that reveal proprietary OpenAI information, and account integrity bypasses.

How is this different from OpenAI's Security Bug Bounty?

The Security Bug Bounty covers traditional system vulnerabilities (SQL injection, auth bypass, SSRF). The Safety Bug Bounty covers AI-specific harms where the model or agent itself is manipulated, even when no underlying system vulnerability exists. Both are hosted on Bugcrowd and reports can be rerouted automatically.

Sources:
OpenAI: Introducing the Safety Bug Bounty program, March 25, 2026
SecurityWeek: OpenAI Launches Bug Bounty for Abuse and Safety Risks
Cybersecurity News: OpenAI Safety Bug Bounty to Detect AI-Specific Vulnerabilities
Infosecurity Magazine: OpenAI Expands Bug Bounty to Cover AI Abuse Concerns

← Back to all articles

SharePost on X LinkedIn

—Was this helpful?

Get the best AI tools tips — weekly

Honest reviews, tutorials, and Happycapy tips. No spam.

AI Security

N-Day-Bench: Can LLMs Find Real Security Vulnerabilities in Production Code?

8 min

AI Security

N-Day Bench: How Researchers Are Testing LLMs to Find Security Vulnerabilities in 2026

10 min