April 16, 2026·2 min read·Anthony D'Onofrio

Why I Built Wraith

Every week I see another story about an AI chatbot doing something it shouldn't — leaking its system prompt, calling tools on behalf of an attacker, or being manipulated through a poisoned document. The pattern is always the same: someone shipped an AI agent, someone else found a way to break it, and by the time the team responded, the vulnerability was already on Twitter.

Traditional security scanners can't test these systems. They were built for SQL injection and XSS, not multi-turn prompt manipulation. You can't find a jailbreak with a regex payload list — you need an adversary who can reason, adapt, and escalate.

So I built one.

What Wraith does

Wraith is an AI-powered red team for AI agents. You paste your chatbot's API endpoint, and Wraith uses Claude to run multi-turn attacks — prompt injection, system prompt extraction, and (coming soon) tool abuse, data exfiltration, guardrail bypass, and permission boundaries. When a simple probe fails, it escalates. When it learns something from the target's response, it adapts.

You get a grade, a list of findings with evidence quotes, a compound attack chain analysis showing how findings combine into worse-than-the-sum attacks, and a downloadable PDF.

Free tier shows you every vulnerability. Paid tiers unlock the remediation.

Why this matters now

In 2026, AI agents are being deployed into customer-support roles, internal tools, and developer workflows at a speed that's outrunning security review. The attack surface is enormous and the defensive tooling is thin. If you're building with LLMs and you haven't red-teamed your prompt architecture, you have no idea whether your agent leaks.

Wraith is an attempt to close that gap — cheaply, quickly, and without requiring a consultant on retainer.

What's next

The MVP has two attack categories wired up. The four remaining — Tool Abuse, Data Exfiltration, Guardrail Bypass, and Permission Boundaries — are in the queue. After that: Stripe integration, scan history, and the human-reviewed "Verified by Harbinger Security" attestation tier.

If you're building an AI agent, scan it. If you find something interesting, I'd love to hear about it.

Want to see a worked example? In a follow-up post, I pointed Wraith at a deliberately vulnerable chatbot and extracted production API keys in 26 seconds. For the full attack taxonomy, read the Prompt Injection guide and the System Prompt Extraction guide. Or try the attacks hands-on at the Wraith Academy.

Practice these techniques hands-on

14 free challenges teaching prompt injection, system prompt extraction, data exfiltration, and more.

Enter the Academy →