AI Agents Are Being Hijacked in the Wild: 22 Attack Techniques Now Documented

Security researchers at Palo Alto Networks’ Unit 42 have documented what many feared but few could prove: indirect prompt injection attacks are actively targeting AI agents on live websites.

The attacks hide malicious instructions in webpage content that AI agents - browser extensions, search engines, automated assistants - unknowingly execute. The first confirmed case of AI ad review evasion using this technique was spotted in December 2025.

This is no longer theoretical. It’s happening at scale.

22 Attack Techniques in the Wild

Unit 42’s research cataloged 22 distinct methods attackers use to embed malicious prompts in webpages. The techniques fall into two categories: delivery methods that hide the payload, and jailbreak methods that bypass AI safety guardrails.

Prompt delivery methods include:

Visual concealment using zero-sized elements, off-screen positioning, CSS suppression, and opacity manipulation
Obfuscation through XML/SVG encapsulation and HTML attribute cloaking
Runtime assembly via Base64 encoding and timed JavaScript execution
URL string manipulation
Visible plaintext embedding (surprisingly common at 37.8% of cases)

Jailbreak methods include:

Social engineering (85.2% of observed attacks)
Instruction obfuscation using invisible characters, homoglyph substitution, and payload splitting
Encoding schemes: HTML entities, Base64, URL encoding, and nested encoding
Semantic tricks through multilingual commands, JSON injection, and authority framing

The attackers are sophisticated. They understand how AI systems process content and have developed repeatable, scalable methods to exploit that processing.

Real-World Attacks Documented

The research identified active attacks across multiple categories:

AI ad review bypass - On reviewerpress.com, researchers found the first documented case of indirect prompt injection designed to fool AI systems that check advertisements for policy violations. Malicious ads now actively try to manipulate the AI reviewers checking them.

SEO poisoning - Sites like 1winofficialsite.in use prompt injection for phishing impersonation via search engine manipulation. When an AI agent summarizes or indexes the page, it follows the embedded instructions.

Unauthorized transactions - Multiple sites attempted to force AI agents into making purchases, donations, or subscription sign-ups on behalf of users.

Data destruction - Commands attempting database deletion or system crashes through fork bombs were detected.

Sensitive information leakage - Prompts designed to extract contact lists, credentials, and other sensitive data from AI agents with access to user information.

Recruitment manipulation - Resume injection attacks designed to influence AI-powered hiring decisions.

Review manipulation - Prompts forcing AI agents to generate only positive business evaluations.

The Numbers

Unit 42’s telemetry reveals the attack landscape:

28.6% of attacks aimed for irrelevant output (confusion/misdirection)
14.2% targeted data destruction
9.5% attempted ad review bypass
37.8% used visible plaintext delivery (hidden in plain sight)
19.8% used HTML attribute cloaking
24.2% of malicious pages contained multiple injections
73.2% of attacking domains were .com addresses

The concentration on .com domains suggests attackers are targeting AI agents used in Western markets, where AI assistant adoption is highest.

Why This Works

The fundamental problem: AI agents process untrusted web content using the same reasoning capabilities they use for trusted user instructions.

When an AI agent summarizes a webpage, searches for information, or analyzes content, it reads and processes everything on that page. If attackers embed instructions that look like legitimate AI commands, the agent may follow them.

“Unlike direct prompt injection, attackers exploit benign features like page summarization or content analysis to trigger malicious behavior at scale,” Unit 42’s research notes.

The attack surface expands as AI agents gain more capabilities. An agent that can only summarize text is less dangerous than one that can make purchases, send emails, or access files. Yet the industry is rapidly deploying agents with exactly these expanded capabilities.

What This Means

The immediate risks are clear:

Browser AI features are vulnerable. Browsers summarizing webpages have been tricked into leaking credentials. Every AI-powered browser feature that processes untrusted content is a potential target.

Email AI assistants are targets. Copilots have taken actions based on poisoned emails or metadata. That AI assistant helping you manage your inbox may be following instructions embedded by attackers.

Automated tools are compromised. Agentic tools have executed attacker-controlled commands after reading compromised documentation. If your AI agent reads external content, it’s at risk.

The scale problem is particularly concerning. Unlike targeted attacks that require identifying specific victims, indirect prompt injection works against anyone whose AI agent visits the poisoned page. One malicious website can affect thousands of AI agents automatically.

Defending Against This

Unit 42 recommends several defensive measures:

Deploy web-scale detection capabilities that distinguish benign from malicious prompts
Implement instruction hierarchy in LLM systems that separates trusted instructions from untrusted web content
Use “spotlighting” techniques that clearly mark boundaries between user commands and external data
Apply adversarial training to help models resist jailbreak attempts
Monitor for known attacker intent patterns and payload engineering techniques

For users, the practical advice is simpler: be extremely cautious about what capabilities you grant AI agents, and understand that any agent processing untrusted web content may be manipulated.

The Bottom Line

The gap between AI agent capabilities and AI agent security continues to widen. As companies race to deploy agents that can browse the web, manage email, and automate tasks, attackers are already exploiting the fundamental vulnerabilities in how these systems process information.

Twenty-two attack techniques. Active exploitation in the wild. First confirmed cases in December 2025.

The theoretical became practical. The question now is whether defenses can catch up before the damage spreads further.