HappycapyGuide

By Connie · Last reviewed: April 2026 — pricing & tools verified · This article contains affiliate links. We may earn a commission at no extra cost to you if you sign up through our links.

AI Security

OpenAI Launches Safety Bug Bounty: Pays Up to $100K for AI Agent Vulnerabilities

March 25, 2026 · 7 min read · Happycapy Guide

TL;DR

OpenAI launched a Safety Bug Bounty program on March 25, 2026 — separate from its existing Security Bug Bounty. It pays up to $7,500 for high-severity public reports and up to $100,000 for critical findings in private campaigns. Targets: prompt injection, agentic manipulation (ChatGPT Agent, Atlas, Codex, Operator), proprietary data leaks, and account integrity bypasses. Hosted on Bugcrowd. Jailbreaks that only produce rude text are explicitly out of scope.

OpenAI has launched a dedicated Safety Bug Bounty program — its first public program specifically targeting AI-native abuse and safety risks. The launch follows a surge in agentic AI deployments and comes three weeks after Anthropic's Claude Code source code was leaked via npm sourcemaps in April 2026, raising industry-wide awareness of AI security gaps.

The new program covers attack scenarios that fall into a gap between traditional security vulnerabilities and general policy violations. When a prompt injection attack hijacks a ChatGPT Agent to exfiltrate user data, that is not a SQL injection. It is a new category of harm — and until now, it had no dedicated reward structure.

What the Safety Bug Bounty Covers

The program targets four primary vulnerability classes, all specific to AI agent behavior:

Vulnerability TypeDescriptionReproducibility Threshold
Prompt Injection / Data ExfiltrationText inputs that reliably hijack a victim's agent to perform harmful actions or leak sensitive dataMust reproduce ≥50% of attempts
Agentic Disallowed Actions at ScaleManipulating ChatGPT Agent, Atlas Browser, Codex, or Operator into performing prohibited actions systematicallyMust demonstrate at-scale behavior
Proprietary Information LeakageModel outputs that reveal internal reasoning processes, training details, or proprietary OpenAI informationMust contain non-public information
Account / Platform IntegrityBypassing anti-automation controls, manipulating trust signals, evading account restrictionsClear path to harmful outcome required

Notably out of scope: jailbreaks that only produce rude language or information that is easily searchable. OpenAI explicitly wants reports of AI-specific harm, not demonstrations that the model can swear.

Reward Structure

TierMax RewardRequirements
Public High Severity$7,500Consistently reproducible, clear mitigation steps included
Case-by-Case (Direct Harm)NegotiatedDirect path to user harm + actionable remediation
Private Campaign (Critical)$100,000Invitation only; focuses on biorisk, novel agentic vectors, GPT-5 private preview

The $100,000 ceiling applies to private campaigns OpenAI runs for specific high-risk areas. Public submissions are capped at $7,500 for high severity — still competitive with standard bug bounty programs at most tech companies.

Try Happycapy — Run Claude, GPT, Gemini and Grok in One Secure Platform

Why OpenAI Launched This Now

Three forces converged to make this launch urgent in March 2026:

Safety Bug Bounty vs Security Bug Bounty: The Difference

OpenAI now runs two parallel programs, both hosted on Bugcrowd. Understanding the boundary between them matters for researchers:

ProgramCoversExample Report
Security Bug BountyTraditional system vulnerabilities: SQLi, auth bypass, SSRF, access controlUnauthenticated API endpoint exposes user data
Safety Bug BountyAI-specific abuse: prompt injection, agentic manipulation, model behavior exploitsCrafted document causes Atlas Browser agent to exfiltrate session tokens

OpenAI's triage team will reroute reports between programs automatically if a researcher submits to the wrong one. Researchers should default to the Safety Bug Bounty for anything involving model or agent behavior.

Implications for AI Users and Developers

For enterprise AI users, this program signals that the attack surface of AI products is now being treated with the same rigor as traditional software vulnerabilities. Agentic AI is not inherently safe just because it runs in a sandbox — it has persistent access to files, APIs, and browser sessions, and it can be manipulated through data it ingests.

For developers building on OpenAI's APIs, the program highlights three practical security considerations:

Happycapy Pro — Multi-Model AI at $17/mo — Access Claude, GPT-5.4, Gemini 3.1 and Grok

Frequently Asked Questions

What is OpenAI's Safety Bug Bounty program?

OpenAI's Safety Bug Bounty program, launched March 25, 2026, pays security researchers to find AI-specific vulnerabilities. This includes prompt injection attacks, agentic manipulation, data exfiltration, and account integrity bypasses. It is separate from the existing Security Bug Bounty and is hosted on Bugcrowd.

How much does OpenAI pay for safety bug reports?

OpenAI pays up to $7,500 for high-severity public submissions. For critical findings in private campaigns — such as biorisk content issues or novel agentic attack vectors in GPT-5 — rewards can reach $100,000. Severity is determined by reproducibility, harm potential, and whether the flaw has a direct path to user harm.

What AI vulnerabilities does the Safety Bug Bounty cover?

The program covers: prompt injection and data exfiltration (reproducible ≥50%), disallowed agentic actions by ChatGPT Agent, Atlas Browser, Codex, or Operator at scale, model outputs that reveal proprietary OpenAI information, and account integrity bypasses.

How is this different from OpenAI's Security Bug Bounty?

The Security Bug Bounty covers traditional system vulnerabilities (SQL injection, auth bypass, SSRF). The Safety Bug Bounty covers AI-specific harms where the model or agent itself is manipulated, even when no underlying system vulnerability exists. Both are hosted on Bugcrowd and reports can be rerouted automatically.

Sources:
OpenAI: Introducing the Safety Bug Bounty program, March 25, 2026
SecurityWeek: OpenAI Launches Bug Bounty for Abuse and Safety Risks
Cybersecurity News: OpenAI Safety Bug Bounty to Detect AI-Specific Vulnerabilities
Infosecurity Magazine: OpenAI Expands Bug Bounty to Cover AI Abuse Concerns
SharePost on XLinkedIn
Was this helpful?

Get the best AI tools tips — weekly

Honest reviews, tutorials, and Happycapy tips. No spam.

Comments