OpenAI Announces Chat GPT-4, an AI That Can Understand Photos


OpenAI has today announced GPT-4, the next-generation AI language model that can read photos and explain what’s in them, according to a research blog post.

Chat GPT-3 has taken the world by storm but up until now the deep learning language model only accepted text inputs. GPT-4 will accept images as prompts too.

“It generates text outputs given inputs consisting of interspersed text and images,” OpenAI write today. “Over a range of domains — including documents with text and photographs, diagrams, or screenshots — GPT-4 exhibits similar capabilities as it does on text-only inputs.”

What this means in practice is that the AI chatbot will be able to analyze what is in an image. For example, it can tell the user what is unusual about the below photo of a man ironing his clothes while attached to a taxi.

Chat GPT4

Last week, Microsoft Germany Chief Technical Officer Andreas Braun said that GPT-4 will “offer completely different possibilities — for example, videos.”

However, per today’s announcement, there has been no mention of video within GPT-4 and the only multi-modal element is the inputting of images — far less than what was expected.

Microsoft had already presented a multi-modal language model that operates in different formats called Kosmos-1.

In the Kosmos-1 presentation, the AI can read images along with a photo. For example, a picture of a clock showing 10:10 is inputted into the AI with the question “The time now?” To which the AI replies, “10:10 on a large clock.”

Kosmos-1 example
Kosmos-1 example

It can also tell the viewer what particular type of hairstyle a woman is wearing or it recognizes a movie poster and can tell the user when that movie will be released.

‘iPhone Moment’

During the “AI in Focus — Digital Kickoff” event in Germany, Braun was joined by the CEO of Microsoft Germany, Marianne Janik, who describes ChatGPT as “an iPhone moment.”

She says that it is not about replacing jobs but about doing repetitive tasks in a different way than before, Heise reports.

“Disruption does not necessarily mean job losses,” she says. “It will take many experts to make the use of AI value-adding.”

Chat GPT has become wildly popular, becoming the fastest-growing consumer app in history to reach 100 million users.

OpenAI, which also operates DALL-E, was criticized by its co-founder Elon Musk who left the company in 2018.

“OpenAI was created as an open source (which is why I named it “Open” AI), non-profit company to serve as a counterweight to Google, but now it has become a closed source, maximum-profit company effectively controlled by Microsoft,” he wrote on February 17. “Not what I intended at all.”

Update 14/3: This article has been updated after OpenAI’s GPT-4 announcement today in which it confirmed there is no video whatsoever in the model and images can only be inputted, not generated as previously thought.

Image credits: Header photo licensed via Depositphotos.