Midjourney Founder Admits to Using a ‘Hundred Million’ Images Without Consent


Midjourney founder David Holz has admitted that his company did not receive consent for the hundreds of millions of images used to train its AI image generator, outraging photogarphers and artists.

Twitter users have been sharing an interview that Holz did with Forbes back in September in which he readily confesses to using images that he didn’t have permission for.

When asked: “Did you seek consent from living artists or work still under copyright?”

Holz replies: “No. There isn’t really a way to get a hundred million images and know where they’re coming from.

“It would be cool if images had metadata embedded in them about the copyright owner or something. But that’s not a thing; there’s not a registry.

“There’s no way to find a picture on the internet, and then automatically trace it to an owner and then have any way of doing anything to authenticate it.”

Holz’s words in the resurfaced interview have outraged Twitter users, partly due to an artist protest against AI images.

Also in the interview, Holz says that Midjourney’s dataset was built from “a big scrape of the internet.”

“We use the open data sets that are published and train across those. And I’d say that’s something that 100% of people do,” he adds.

The article has been repeatedly shared by creators this week who have not been holding back on what they think about Holz and Midjourney.

“‘We just stole all the copyrighted artwork, mushed it through an AI, reproduced it infinitely, and make money off of it,'” parodies Kyle Chayka.

“What baffles me is that David Holz blatantly admits to theft and copyright infringement in this article! His attitude is, ‘yeah, we stole from you to build a platform that we make a profit from, what are you going to do about it,'” says artist Dave Lung.

Some cast doubt on Holz’s assertion that images don’t have metadata embedded in them.

“Every piece of art I process in Photoshop has embedded metadata that marks it as copyrighted and includes contact info,” says one artist. “Embedded copyright in image metadata has existed for years.”

Copyright and Consent

While the majority of uproar in recent days has come from digital artists, it’s something that affects photographers exactly the same.

Holz says in the Forbes article that currently creators can’t opt out of being included in a training model, and they also can’t opt out of being named in the prompts.

A website called Have I Been Trained was set up was set up so that photographers could find out if their photos have been used to train AI image generators.

Users of AI image generators can type in a photographer’s name to get an image in that photographer’s style. For example, the image below was created with the prompt: “Ansel Adams photography, apocalyptic, festival, American flag.”


This means that without a doubt, the AI has been trained on Ansel Adams’s copyrighted photos and it’s clear that the company did not receive permission from Adams’s estate.

As there is no legal framework surrounding AI image generators, this story will likely run deep into 2023.

Image credits: Header photo licensed via Depositphotos.