ChatGPT Can Now See Your Photos and Respond to Them

An text overlay askes what the cloud in the background are caused by.

ChatGPT is listening. ChatGPT is watching. In very cool and normal news, the generative artificial intelligence (AI) large language model will soon be able to see your photos and hear what users have to say and respond back.

The voice and image recognition update will roll out to Plus and Enterprise ChatGPT users over the next two weeks. The voice setting will come to iOS and Android, and image response will be available on all platforms, according to OpenAI.

“Voice and image give you more ways to use ChatGPT in your life. Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you,” a release from OpenAI reads.

OpenAI says users can share one or more images with the AI language model, even allowing them to direct ChatGPT to a specific element using the drawing tool on the mobile app. It works with photographs, screenshot, and documents with text and images, and it’s powered by GPT-3.5 and GPT-4, OpenAI’s latest version.

The ChatGPT owner explains that the image input model was tested with “a diverse set of alpha testers” and red teamers, who are people posing as a bad actor or hacker to identify possible security issues or opportunities for misuse.

“This approach has been informed directly by our work with Be My Eyes, a free mobile app for blind and low-vision people, to understand uses and limitations. Users have told us they find it valuable to have general conversations about images that happen to contain people in the background, like if someone appears on TV while you’re trying to figure out your remote control settings,” the release reads. ChatGPT is also meant to avoid analyzing and making “direct statements about people since ChatGPT is not always accurate and these systems should respect individuals’ privacy.”

Anyone hoping to try the new voice feature must go to Settings, then New Features on the mobile app. They’ll have to opt in before choosing one of five voices. OpenAI says the addition is “powered by a new text-to-speech model, capable of generating human-like audio from just text and a few seconds of sample speech.” The company says it collaborated with professional voice actors to create each of the voices, but it’s unclear to what degree their voices are in the final product and how much is built using AI generation.

OpenAI also says it discourages its use in high-risk cases without proper verification.

“The new voice technology — capable of crafting realistic synthetic voices from just a few seconds of real speech — opens doors to many creative and accessibility-focused applications. However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud,” the release adds. This is why, the company says, it is only using this technology for voice chat and other specific use cases. OpenAI says Spotify is using the new feature to allow podcasters to translate their work into multiple languages.

In the reverse, ChatGPT will also the company’s open-source speech recognition system Whisper to transcribe spoken words into text, though many products already have speech-to-text features.

Image credits: OpenAI