Brazilian Children’s Photos and Personal Details Found in AI Training Data Set

Jun 10, 2024

Matt Growcoot

A child touches a digital padlock icon on a dark screen with a circuit board background, symbolizing cybersecurity. The lock icon emits a blue glow, illuminating the child's face.

A disturbing report has revealed that personal photos of Brazilian children have been included in a major image library used to train AI image generators.

Human Rights Watch says that the photos are being scraped off the web and included in the LAION-5B data set have personal information attached including the children’s name, age, and location. The group also points out that others can use AI to create malicious deepfakes that put children at risk.

“Their privacy is violated in the first instance when their photo is scraped and swept into these datasets. And then these AI tools are trained on this data and therefore can create realistic imagery of children,” says Hye Jung Han, children’s rights and technology researcher at Human Rights Watch and the researcher who found these images.

“The technology is developed in such a way that any child who has any photo or video of themselves online is now at risk because any malicious actor could take that photo, and then use these tools to manipulate them however they want.”

Human Rights Watch says that it found some of the children’s personal details that make their identities easily traceable. In one example, a photo of a two-year-old girl with her newborn sister provides a caption with their names and the exact hospital in Santa Catarina where the baby was born nine years ago. Some of the images are as recent as 2023 while others go back all the way to the mid-1990s.

Over 170 photos of children across 10 different states of Brazil were found and Human Rights Watch reviewed less than 0.0001 percent of the 5.85 billion images and captions in LAION-5B. Some of the photos were posted by the children or family members to various social media websites.

The group says that once the data is in the AI systems, the information can be leaked or identical copies can be reproduced.

AI tools that were trained on LAION have been used to generate explicit images of children using innocent photos as well as explicit and illegal images of children.

The company behind LAION-5B, German outfit LAION, has vowed to remove the children’s photos after the Human Rights Watch report. However, the company added that it is also the responsibility of the children’s guardians to remove personal photos from the internet which it says is the most effective protection against misuse.

Human Rights Watch has urged lawmakers to make comprehensive safeguards for children’s data privacy.

“Generative AI is still a nascent technology, and the associated harm that children are already experiencing is not inevitable,” adds Han. “Protecting children’s data privacy now will help to shape the development of this technology into one that promotes, rather than violates, children’s rights.”

Image credits: Header photo licensed via Depositphotos.