Reference

The State of LLM Bug Bounties in 2026

11 min read·By Anthony D'Onofrio·Updated 2026-04-19

A practitioner's guide to where LLM bug bounties actually pay in 2026 — program-by-program scope comparison, typical payouts, which classes of AI bugs get rewarded versus closed as 'known limitation,' and how to pick a scope that fits how you hunt.

If you hunt for a living, the question you want answered is not "does prompt injection exist?" — you know it does — but "which programs pay for what, and how much?" The bug bounty landscape for LLM-specific vulnerabilities has matured fast over the last eighteen months, but the published information about it is scattered across dozens of program pages, policy revisions, and disclosed reports. Nobody has pulled it into a single comparative view.

This is that view. I've combed through every major program I could find that either explicitly scopes AI vulnerabilities or has paid for them in public disclosures, and summarized the state of play: what's in scope, what's out of scope, what the median payout actually looks like versus the advertised ceiling, and where the gaps are if you're trying to pick a program to focus on.

I'll flag my biases up front. I run Wraith, an AI-security platform that includes a hands-on academy and a pentesting certification for this exact discipline, so I care about hunters succeeding in this space — it validates the category. I've also reported vulnerabilities into several of these programs myself. Where I have direct experience, I'll note it.

The short version

Anthropic, OpenAI, and Google run the three largest and most mature AI-specific bounty programs. All three explicitly scope model and product vulnerabilities, and all three have paid five-figure bounties for serious finds. They're also the strictest about what counts — a lot of "prompt injection" submissions get closed.
HackerOne and Bugcrowd host a fast-growing long tail of AI-adjacent programs — SaaS products that have bolted AI features onto existing surfaces. These programs often don't know how to triage LLM-specific bugs, which is both an opportunity (novel finds, lower competition) and a frustration (reports closed as informational because the triager doesn't understand the impact).
The biggest gap in 2026 is indirect prompt injection scope. Most programs nominally accept it; in practice, triage consistently under-rates it, especially when the impact requires a multi-step chain. This is the single most important thing to get right when writing your report.
Median payouts are lower than the advertised maximums suggest. Headline numbers of $15k-$50k refer to critical findings against core infrastructure. The median real-world AI bug payout in 2026 is closer to $500-$2,500, with premium payouts concentrated at a few specific high-impact categories (tool abuse that reaches infrastructure, cross-tenant data exfiltration, anything that bypasses paid features).

Read on for the program-by-program breakdown, or jump to picking a program that fits how you hunt if you already know the landscape and want tactical advice.

Program-by-program breakdown

Anthropic's Model Safety Bug Bounty

Anthropic has run AI safety bounty programs since 2023 and moved to a permanent program in late 2024, hosted through HackerOne. Scope covers both the Claude models and the surrounding product surface (Claude.ai, the API, Claude Projects, etc.).

What's in scope:

Universal jailbreaks — prompt patterns that reliably bypass safety training across multiple high-stakes refusal categories
Classifier bypasses for deployed safety systems (especially the constitutional classifiers research)
Product-level vulnerabilities: authentication, data isolation between workspaces, billing integrity
Tool-use-related vulnerabilities in the Claude API and the Computer Use beta

What's out of scope (explicitly or in practice):

"Single-domain" jailbreaks — getting Claude to produce one kind of refused content in one kind of framing. Scope requires breadth.
Hallucination-based misinformation unless it has a clear exploitation path
Model denial-of-service via long inputs (out-of-scope as expected behavior under rate limits)

Typical payouts:

Headline: up to $20,000+ for critical universal jailbreaks
Realistic middle-band: $1,500-$5,000 for solid partial bypasses, novel classifier evasion techniques, or product-level AuthZ bugs
Lower band: $300-$1,000 for narrower findings that are still novel

My read: Anthropic's program is one of the highest signal in the space. Triage is technically strong and responses are fast by bounty-program standards. The bar for "universal jailbreak" is genuinely high — they're not paying out for single-turn "DAN"-style prompts — but the payouts when you hit it are competitive with top conventional programs. Best fit for hunters who can invest multi-day effort per finding.

OpenAI's Bug Bounty Program

Hosted through Bugcrowd. The program scope expanded substantially in 2024 to explicitly cover model safety alongside the traditional product surface (ChatGPT, the API, the Custom GPT ecosystem, plugins in their various incarnations).

What's in scope:

Authentication / AuthZ across ChatGPT accounts, teams, and enterprises
Custom GPT exploits — actions abuse, system prompt extraction from third-party GPTs (with the GPT creator's inclusion), plugin-related scope escapes
Model and feature vulnerabilities that affect confidentiality or integrity of user data
Specific vulnerability classes in their published scope document (updated quarterly)

What's out of scope (explicitly or in practice):

Prompt injection "in general" — OpenAI's policy has historically been that prompt injection against the base model is a known limitation, not a payable bug. What is payable is prompt injection that achieves a specific further impact (data exfiltration, privilege escalation, cross-user effects).
Jailbreaks for content policy — you can't get paid for making ChatGPT say a bad word.
Hallucinations, including harmful ones, unless they lead to further exploitation.

Typical payouts:

Headline: up to $20,000 for critical product vulnerabilities
Realistic middle-band: $500-$3,000 for solid AuthZ bugs in the Custom GPT ecosystem or the API
Lower band: $200-$500 for smaller findings

My read: OpenAI's program rewards you for chaining prompt injection into concrete further impact. "I can make ChatGPT say X" is closed; "I can make a third-party Custom GPT reveal its uploaded knowledge files, containing data the creator didn't intend to share" is paid. Report framing is disproportionately important here.

Google's AI Bug Bounty (VRP expansion)

Google's Vulnerability Reward Program expanded in 2023 to cover generative AI specifically, and the scope has been refined several times since. Covers Gemini (the consumer product and the API), AI features bolted onto Workspace (Gmail, Docs, Drive), and Google's various AI-powered research interfaces.

What's in scope:

Indirect prompt injection against AI-powered Workspace features (Gemini in Gmail reading adversarial email content, Gemini in Docs processing adversarial document content)
Cross-account data exfiltration via AI features
Prompt injection that leads to exfiltration of conversation history, uploaded files, or Workspace content
Model and training-data extraction attacks with measurable confidentiality impact

What's out of scope:

Content policy violations / jailbreaks that don't have a confidentiality or integrity impact
Hallucinations
"Purely theoretical" prompt injection without a demonstrated exploitation path

Typical payouts:

Headline: up to $31,337 for severe OAuth/AuthZ issues and anything affecting large user bases
AI-specific headline: multi-thousand-dollar payouts reported publicly for indirect prompt injection into Workspace features
Realistic middle band: $500-$3,000

My read: Google's program has the most interesting attack surface for hunters in 2026. Gemini-in-Workspace is a sprawling indirect-injection playground: any tool that reads email content, documents, or third-party data becomes a potential target. The public disclosed reports (Johann Rehberger has published several) are required reading for anyone planning to hunt here.

Meta's AI Bug Bounty

Meta's program, hosted through HackerOne, covers the Meta AI product surface, the Llama ecosystem, and AI features across Facebook, Instagram, and WhatsApp.

What's in scope:

Privacy and integrity bugs in Meta AI
Cross-conversation data leakage
AI-feature abuse that leads to account or data compromise on the host platforms
Llama model distribution and fine-tuning infrastructure vulnerabilities

Typical payouts: Meta's payouts skew lower for AI-specific issues than for their core social-graph and platform bugs. Expect $500-$2,000 for solid AI findings; escalation is possible for issues with broad user impact.

My read: Less mature program for AI specifically than the three above, but the surface is huge and under-tested relative to the attention the Big Three get.

Microsoft's AI Bug Bounty

Microsoft runs AI bounties through the standard Microsoft Security Response Center channel, with dedicated scope for Copilot across its many surfaces (GitHub Copilot, Microsoft 365 Copilot, Windows Copilot, etc.), Azure AI services, and Bing Chat.

What's in scope:

Cross-tenant data leakage in Microsoft 365 Copilot — high-value given enterprise deployment
Indirect prompt injection against Copilot features
Escalation of privilege via Copilot tool use
Azure AI service vulnerabilities (model endpoints, fine-tuning data handling)

Typical payouts: Microsoft's standard tiers apply. Critical Copilot vulnerabilities can reach $30,000+; realistic median is $500-$3,000.

My read: The cross-tenant angle in M365 Copilot is particularly interesting. Enterprise deployments are sensitive to any cross-customer data leakage, and Copilot's access to OneDrive, SharePoint, and Exchange content per tenant creates a wide attack surface for scope-related bugs.

HackerOne's AI / GenAI category programs

The long tail. Hundreds of SaaS products have enrolled on HackerOne with AI-bolted features in scope. These include design tools, writing assistants, code assistants, and customer-support products.

In-scope patterns (varies widely by program):

Prompt injection that leads to account takeover, privilege escalation, or data exfiltration
System prompt extraction where the prompt contains customer-specific or sensitive information
Tool abuse that reaches infrastructure

Typical payouts: $100-$2,000 for most findings. A few well-funded programs pay higher.

My read: This is where most hunters will actually spend their time. The upside: less competition than the Big Three, novel products with less-tested AI integrations, and often a gentle ramp-up for hunters new to AI bugs. The downside: triage quality varies wildly. A program that paid for prompt injection yesterday might close the same class of bug tomorrow because a new triager doesn't understand the impact. Report framing is the difference between getting paid and getting closed.

Bugcrowd's AI programs

Similar long-tail dynamics to HackerOne, with a slightly different program mix. OpenAI and a handful of AI-native startups run on Bugcrowd rather than HackerOne.

My read: Worth watching, especially for AI-native startups whose programs are less crowded than Anthropic/OpenAI/Google. Payouts are similar to HackerOne mid-band.

Direct-to-vendor programs

A growing number of AI-native companies run private or direct bounty programs outside HackerOne/Bugcrowd. Examples in 2026 include several vector-DB vendors, prompt-management SaaS companies, and AI observability platforms.

My read: Historically these have less competition and faster triage. They also pay less on average, but the bar for "valid" is often lower because the product team is small and appreciative of external testing. Good for hunters building initial CV credibility.

Which attack classes actually pay

A lot of hunters get frustrated because they submit "prompt injection" findings and watch them close as "informational." The root cause, almost always, is that the report didn't demonstrate impact beyond the injection itself. Here's my read on which attack classes convert to payouts in 2026, ranked roughly by consistent reward:

Tool abuse leading to SSRF, RCE, or infrastructure access. The highest-paying class. If prompt injection lets you make the agent fetch http://169.254.169.254/latest/meta-data/ and return the result, you're not writing a prompt injection report — you're writing an SSRF report with a prompt injection as the entry vector. Frame accordingly.
Cross-tenant or cross-user data exfiltration. Gemini-in-Workspace, M365 Copilot, any multi-tenant AI SaaS product. These reports pay because the impact is concrete and enterprise-visible.
System prompt extraction where the prompt contains secrets. API keys, internal URLs, partner names. Not the prompt itself — the embedded secrets are what makes it payable.
Indirect prompt injection against AI-powered features that read third-party content. Email readers, document summarizers, browsing agents. The attacker-never-speaks-to-the-AI angle is what sells the severity.
Cross-session data leakage. Memory systems, "long-term context" features, conversation persistence bugs. Often a scope-filter bug at heart, which makes them legible to triage.
Paid-feature bypasses via prompt manipulation. Getting a free-tier user to trigger paid-tier functionality. Some programs treat this as business-impact, some as out-of-scope.
Universal jailbreaks with policy-relevant impact. Mostly relevant at Anthropic, OpenAI, and Google. The bar is high but the payouts can be top-of-band.
Output-handling bugs (XSS, SQLi, command injection via LLM output). Payable but often as conventional bugs, not AI bugs. Scope them accordingly.
Hallucination with an attacker-controlled angle. Payable in narrow cases — the canonical one being a model that reliably invents a package name, which an attacker then registers. Most "hallucination" reports close as informational.
Direct jailbreaks without further impact. Usually out of scope in 2026. Don't spend your time here.

Picking a program that fits how you hunt

The right starting program depends on what kind of hunting you like:

If you like deep technical research with long time horizons: Anthropic and OpenAI. Expect multi-day investments per report, high bar to get paid, top-tier payouts when you land one.
If you like breadth and quick iteration: Google Workspace AI features and Microsoft Copilot. Wide surface, lots of indirect-injection permutations to try, medium payouts but more frequent wins.
If you're new to AI hunting and want to build credibility: HackerOne's AI long tail. Lower bar to first payout, more forgiving programs, but invest heavily in report framing.
If you already hunt web vulns and want to transfer skills: Look for AI-bolted SaaS products on HackerOne and Bugcrowd. Your existing instincts for AuthZ, IDOR, and SSRF translate directly — the AI layer is often just a new entry vector to the same old bugs.

What's missing from the landscape

A few gaps stand out to me, and I expect them to close over the next 12-24 months:

There is no generalized AI bug bounty program. You can't pick a random consumer AI product and assume it has a bounty program. Most enterprises with AI-bolted products lag 1-2 years behind their conventional security posture on establishing AI-specific scope.
Triage expertise is the bottleneck, not bug count. The reason reports close as informational is that the triager doesn't understand prompt injection's exploitation model. As the discipline matures, expect more specialized AI triagers at the top programs.
Agentic AI bounties are underpriced. Full-kill-chain attacks where an AI agent is the initial access vector are real but bounty scope rarely covers the conventional side of the chain. This is one of the reasons I'm building the WCRO cert track longer-term — the intersection matters.

How to get started this week

If you've read this far and are thinking about actually hunting, here's the shortest path I can give you:

Read Prompt Injection: A Complete Guide for 2026 and System Prompt Extraction: Techniques and Defenses. Both are long-form attack taxonomies.
Walk through the OWASP Top 10 for LLM Applications, annotated. Map each category to one attack class you can execute.
Spend a weekend on the Wraith Academy — six hands-on modules covering the core offensive disciplines. You'll know the attack surface cold by the end.
Pick one program from the list above that fits how you hunt. Spend a week on reconnaissance before you file your first report. Understand the product, the scope, and recent disclosed reports on the program.
When you find something: write the report like a pentest finding, not a forum post. Root cause, reproduction steps, concrete impact, remediation suggestion. The difference between a $500 bug and a $2,500 bug is often framing, not severity.

Good hunting.

Want to test this on your own agent?

Paste your chatbot's API endpoint. Get a real security grade in minutes — free during launch week.

Scan your agent →