AI & ML

New Research Uncovers Vulnerabilities in AI Web Agents to Prompt Injection Attacks

Recent findings reveal AI web agents face critical vulnerabilities to prompt injection attacks, impacting multiple stakeholders and task integrity.

Jun 12, 2026 ● 3 min read

Recent research has uncovered significant vulnerabilities in AI web agents, especially concerning prompt injection attacks. A study conducted by experts from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign highlights that leading AI systems, including those using GPT-5 and Gemini, lacked effective defenses against these attacks.

The researchers executed 3,168 adversarial tests across platforms like NanoBrowser and BrowserUse, utilizing 264 distinct benchmark scenarios. Their findings show that indirect prompt injection attacks, where malicious instructions are cleverly concealed within regular web content such as reviews and metadata, achieved success rates ranging from 41.67% to 68.16%. Direct prompt injection attacks fared even more poorly for security, with success rates exceeding 79% across all tested configurations.

As the authors of the study noted, “These failures exhibit distinct patterns when analyzed through a stakeholder lens: some attacks succeed without disrupting the user’s delegated task while disproportionately harming third parties (stealthy parasitism), whereas others disrupt task completion without realizing the adversarial objective (misaligned disruption).” This complexity underpins a pervasive issue with how AI agents manage security.

Attack Objectives Reveal Failure Modes

The benchmark aimed to assess various web agent outcomes, notably Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure. An agent exhibiting Robust Behavior would fulfill a user's request without compromising task integrity or stability. The absence of any Robust Behavior across tested configurations signals a broader issue; not one attack method left agents unscathed.

According to the researchers, the results emphasize that “prompt-injection vulnerability in deployable web agents cannot be characterized by any single metric,” suggesting a need for nuanced evaluations given how attack effectiveness and task disruption are intertwined.

Stealthy Attacks Risk Undetected Malfeasance

One significant mode of failure identified is “stealthy parasitism,” where an AI agent may complete tasks for users while simultaneously pushing an attacker’s agenda. For instance, a false recommendation buried in product reviews could bias an agent towards a specific product without raising immediate red flags for the user, misleadingly appearing productive while damaging competing sellers.

The study positions prompt injection as a systemic security concern that transcends individual user safety, implicating multiple stakeholders in the potential for harm.

Differentiated Risks for Stakeholders

Unlike traditional benchmarks that primarily gauge attack success, this research evaluates effects across three stakeholder groups: end users, third-party sellers, and platforms. Results indicate that these groups face different vulnerabilities. Seller-targeted attacks showed the highest success rates, while user-targeted ones demonstrated lower task deviations, creating challenges for detection since normal workflows can continue even as adversarial goals are met.

The researchers argue that, “the same agent can simultaneously appear stealthy on user-targeted attacks, susceptible on seller-targeted attacks, and unstable on platform-targeted attacks,” making broad measures of attack success inadequate to capture stakeholder-specific vulnerabilities.

Impact of Models and Architectures on Security

The benchmark also highlighted that different AI models generate varying levels of security against prompt injection. For example, replacing GPT-5 with Gemini-2.5-Flash resulted in a 26.49 percentage point increase in indirect prompt injection success on NanoBrowser, along with higher instability on BrowserUse.

This finding suggests that the resilience against prompt injections relies not only on the model used but also on its integration within an autonomous system. Accordingly, the researchers emphasize that evaluating security requires understanding how architecture and stakeholder interaction shape vulnerability.

Exploring New Attack Vectors Through Images

Research also indicated that prompt injection attacks might extend beyond text manipulations. In preliminary multimodal experiments, altering an image alone led to a product's selection rate soaring from 10% to 76.67%, highlighting the potential for visual content to significantly influence AI decision-making.

While initial results are limited, they hint at the necessity for vigilance against emerging attack vectors as organizations increasingly implement AI systems across various applications.

Source: John Martinez · www.csoonline.com