Seemingly Harmless Photos Could Be Used to Hack AI Agents

A new study has revealed a new type of cyber threat linked to AI agents, where ordinary-looking photos can be altered to secretly issue malicious commands.
AI agents are an advanced version of AI chatbots and are increasingly being seen as the next frontier in technology. Companies like OpenAI recently released their own ChatGPT AI agent. Unlike chatbots, these AI agents not only answer questions but also perform tasks on a user’s computer, such as opening tabs, sending emails, and scheduling meetings.
However, in a new study by researchers at the University of Oxford found that photos — such as wallpapers, ad images, or even pictures posted on social media — can be secretly altered so that, while they look perfectly normal to humans, they contain hidden instructions that only the AI agent can “see.”
According to a report published by Scientific American, if an AI agent comes across one of these doctored images while working (for example, it notices the image on a user’s desktop background in a screenshot), it could misinterpret the pixels as a command. That might make it do things a user didn’t ask for, such as share their passwords or spread the malicious image further.
For instance, the study’s co-author Yarin Gal, an associate professor of machine learning at Oxford University, gives Scientific American the example of how an altered “picture of Taylor Swift on Twitter could be sufficient to trigger the agent on someone’s computer to act maliciously.” To a human’s eyes, the photo looks completely normal. But the AI reads it differently, because computers process images as numbers, and small, invisible pixel tweaks can change what the AI thinks it’s seeing.
Any sabotaged image, whether it be a photo of Taylor Swift, a kitten, or a sunset “can actually trigger a computer to retweet that image and then do something malicious, like send all your passwords. That means that the next person who sees your Twitter feed and happens to have an agent running will have their computer poisoned as well. Now their computer will also retweet that image and share their passwords.”
The risk is reportedly greatest for “open-source” AI systems, where the code is available for anyone to study. That makes it easier for hackers to figure out exactly how the AI interprets photos and how to sneak in hidden commands.
So far, the researchers say this threat has only been seen in controlled experiments, and there are no reports of it happening in the real world. Still, the study’s authors warn that the vulnerability is real and want to alert developers before AI agents become more common. The researchers say the goal is to create safeguards so these AI agent systems can’t be tricked by hidden instructions in everyday photos.
Image credits: Header photo licensed via Depositphotos.