Shocked Artist Finds Private Medical Photos in AI Training Data Set


An artist has found her private medical photos in a data set that is used to train artificially intelligent (AI) image generators.

A San Francisco-based digital artist, who goes by the name Lapine, says that she found photos of herself in the LAION dataset, despite them being taken by a doctor and explicitly denying permission for the images to be shared anywhere.

Lapine discovered her medical photos on a website called Have I Been Trained, a tool that was set up for artists and photographers to check whether their work was used to train AI image generators such as DALL-E.

Users can upload a photo to Have I Been Trained and reverse search it to see if LAION-5B uses it, and similar images, as a reference. This is what Lapine did, and after she uploaded a recent photo of herself she was shocked to see before-and-after medical photos of her face.

“My face is in the LAION dataset,” Lapine writes on Twitter.

“In 2013 a doctor photographed my face as part of clinical documentation. He died in 2018 and somehow that image ended up somewhere online and then ended up in the dataset- the image that I signed a consent form for my doctor — not for a dataset.”

In an interview with Ars Technica, Lapine says she has a genetic condition called Dyskeratosis Congenita, a rare disease that caused her to undergo a small procedure in 2013 and the photos were taken by the surgeon. The surgeon died of cancer in 2018 but somehow the files were released online somewhere.

Lapine posted a photo of the photographic authorization form where she clearly marks that the photo is only for her file and is not to be shown to anyone.

“It’s the digital equivalent of receiving stolen property. Someone stole the image from my deceased doctor’s files and it ended up somewhere online, and then it was scraped into this dataset,” Lapine tells Ars Technica.

Lapine’s photos are now part of a database that informs AI-image generators. Her name isn’t attached to the photos, but the fact that they are on there at all is deeply troubling.

“It’s bad enough to have a photo leaked, but now it’s part of a product. And this goes for anyone’s photos, medical record or not. And the future abuse potential is really high,” Lapine adds.

Have I Been Trained

Have I Been Trained was created by a group of artists who call themselves Spawning. They want to give creators the power to deny or approve their work being included in data training sets such as LAION-5B.

“We believe that each artist ought to have the tools to make their own decisions about how their data is used,” they say.