Blog

Prompt Injection Debate

THOUGHT LEADERSHIP

Is Prompt Injection a Vulnerability?

Jun 11, 2026

7 Min

Ax Sharma

Head of Research

THOUGHT LEADERSHIP

Is Prompt Injection a Vulnerability?

Jun 11, 2026

7 Min

Ax Sharma

Head of Research

THOUGHT LEADERSHIP

Is Prompt Injection a Vulnerability?

Jun 11, 2026

7 Min

Ax Sharma

Head of Research

The security community can't agree on whether prompt injection is a real vulnerability. Depending on who you ask, it's either a critical unpatched flaw in every AI system or a known limitation that researchers keep rediscovering and calling new.

Both camps are missing something.

The "Known Limitation" Argument

The argument against classifying prompt injection as a vulnerability usually runs like this: LLMs can't reliably separate data from instructions. They never could. That's a property of the architecture, not a bug someone failed to patch. You can't file a CVE against how transformers work.

This position has real defenders. When security engineer John Russell disclosed four issues in Microsoft Copilot in late 2025, including indirect prompt injection leading to system prompt leak and command execution within Copilot's sandboxed Linux environment, Microsoft closed the cases. Their stated reason: the findings didn't cross a security boundary, and impact was limited to the requesting user's own execution environment.

Security researcher Cameron Criswell, commenting on Russell's disclosures, reinforced the point: fixing that without breaking everything else, he argued, is not obviously possible.

OWASP's GenAI project takes a measured position on the specific question of system prompt leakage: the disclosure of a system prompt by itself isn't the risk. The risk is what the system prompt contains, whether that's sensitive data, guardrail logic being exposed for bypass, or privilege assumptions embedded in the prompt. Strip those things out and a leaked system prompt is mostly harmless.

That's all defensible. But it only holds for the narrow category of prompt injection against single-user, self-contained AI assistants where the output lands back in the same user's session. It breaks down quickly when agents act on the world.

The Patching Problem

If prompt injection is just an architectural limitation, why do vendors ship fixes for it?

Take CVE-2025-54131, disclosed against Cursor in August 2025. An attacker could bypass Cursor's terminal command allowlist using a backtick character or $(cmd) substitution. On its own that's a parsing bug. But the advisory explicitly notes the vector: chain it with indirect prompt injection and you get arbitrary command execution without user approval, against a user who had configured the tool to run in auto-run mode. Cursor assigned it a CVE, scored it CVSS 6.4, and patched it in version 1.3 with a more robust allowlist parser.

That's a prompt injection chain, formally acknowledged as a vulnerability, with a CVE number and a fix. Not a shrug.

More broadly, system prompt hardening, instruction hierarchy enforcement, input sanitization layers, refusal tuning on specific injection patterns: these are engineering responses to a problem vendors are treating as addressable. When a vendor tightens their system prompt handling after a researcher demonstrates leakage, that's a patch. When an AI assistant is updated to refuse a class of indirect instructions it previously followed, that's a fix. Neither of these happens if the behavior is purely architectural fate.

In his original disclosure, Russell noted that Claude refused all the injection methods he confirmed working against Copilot. That's not an architectural difference between the two products; both are built on transformer-based LLMs with the same fundamental separation problem. It's a difference in mitigations. And mitigations exist because someone decided the behavior was worth fixing.

The "known limitation" framing also does something convenient for vendors: it moves the target. If prompt injection is a property of LLMs rather than a flaw in a specific product, the vendor isn't responsible for fixing it, researchers don't get credit for finding it, and bug bounty programs don't have to pay out. It's a pattern the security community has seen before, well beyond AI – vendors contesting scope, rejecting reports as out of scope, or quietly patching without acknowledgment, and it's not getting better.

Where the Argument Falls Apart

The threat model for agentic AI is different. An AI agent reading a file, browsing the web, or calling external tools is processing untrusted content and then taking action based on it. That's not a "known limitation." That's an attack surface.

Consider the difference between two scenarios:

Scenario A: A user uploads a document to a chatbot. The document contains injected instructions. The chatbot's behavior changes; the user sees unexpected output in their own session.

Scenario B: An AI agent with access to a user's email and calendar reads a message from an external sender. That message contains injected instructions. The agent books a meeting, forwards a document, or exfiltrates session context to an attacker-controlled endpoint.

In Scenario A, the victim and the attacker-influenced session are the same person. The blast radius is essentially self-contained. Microsoft's framing, that impact was limited to the requesting user's execution environment, applies here.

In Scenario B, an external party has influenced an agent operating with real permissions against real resources. That's a different class of issue. The LLM's inability to separate data from instructions is still the underlying cause, but "known limitation" stops being a satisfying answer when the consequence is unauthorized access or data exfiltration.

See what your agents are actually doing. Book a demo today

The Context Dependency Problem

This is why "prompt injection: vulnerability or limitation?" is the wrong question. The right question is: in what context?

The same behavior, an LLM following injected instructions from untrusted input, can be trivial or severe depending on:

What tools the agent has access to. A read-only Q&A assistant is different from an agent with filesystem or network access.
Who owns the execution context. Self-contained user sessions are different from multi-tenant deployments or agents operating on behalf of an organization.
What the injected instruction can reach. Changing a response tone is different from triggering an outbound call or exfiltrating credentials.
Whether the output is reviewed before action is taken. A human-in-the-loop changes the equation (at least in theory).

Vendors like Microsoft applying a single, static "does this cross a security boundary" test may be reasonable for consumer chatbots today. The same test will miss real attacks in the agent deployments going live right now.

The Disclosure Problem

There's a practical consequence to the "not a vulnerability" framing: researchers who find real, exploitable instances of indirect prompt injection in agentic tools have nowhere to file them.

We see this in practice every day. Several VS Code AI extensions, including tools processing untrusted workspace content and running tool calls autonomously, have behaviors that would qualify as vulnerabilities under any reasonable threat model for an agentic tool. The CVE framework doesn't have good guidance here. Bug bounty programs often apply the same Microsoft-style "user context only" carve-out that lets high-impact findings fall through.