Prompt Injection

Definition

Technique that manipulates an LLM's input (ChatGPT, Claude, Gemini) to override its original instructions and make it act outside its intended behaviour.

Two variants: direct (user writes the payload — 'ignore your previous instructions and...') and indirect (the payload travels in external content the LLM consumes — email read by an agent, scraped web page, uploaded file).

Indirect is the more dangerous one in agentic AI: your agent with permissions reads a malicious email → executes attacker commands with YOUR permissions. This is the first item in the OWASP LLM Top 10 (LLM01).

Defence: clear separation between system prompt and user input (don't mix in a single string), guardrails with prior classifier, principle of least privilege on agent tools, output filtering, human-in-the-loop for high-blast-radius actions (delete, transfer, send). No complete solution yet — it's an active research area.