Amazon’s recently announced Titan generative AI isn’t quite the same as consumer-facing image generators like Dall-E and Midjourney, but the Titan Image Generator may hold implications for professional photographers working with corporate clients.
Businesses that use it can take images of their own products and have the generative AI swap out settings, backgrounds, or complementary visual elements through prompts. It can also produce entirely original images.
For its part, Amazon says photographers have nothing to fear, and that the Titan Image Generator will supplement rather than replace the original content photographers and creative digital artists produce for their clients.
“The [Titan Image Generator] would really be meant to augment what photographers are always doing because, and based on my observations and what I’ve seen at the enterprise level, is most people want new and original content, especially when they’re looking at photographing new events or new people. They’re going to still want photographers,” says Sherry Marcus, Director of Applied Science in the Generative AI Services Organization at AWS in an exclusive interview with PetaPixel.
That may also depend on the kind of photography brands or corporations are looking for. Marcus sees content creation within advertising as the biggest use case right now, where industries with the “largest appetite” for ads would be interested in how Titan’s multimodal AI works to create images supplementing their own assets. A brand trying to figure out where to place images can adapt said images based on, for instance, what’s on the website, what user preferences are, as well as regional or seasonal imperatives.
Teaching the AI the Ropes
Being a multimodal generative AI platform, Amazon is teaching its Titan Image Generator with a lot of data from both images and text to learn and spot similar images and be more precise in delivering results. The “secret sauce” is in trying to predict what the next words or images should be in the model, which is a key element in how the AI is being trained. Humans supervise the fine-tuning, which is basically sets of questions and answers. So, for example, if the request is a picture of a penguin on a surfboard in Hawaii, variations should appear that are very specific to the request.
“When you search for something in natural language, we encode the content both in text and in the image, which is very powerful in getting the right answer and generating all of those components,” she says. “In the prompt, you can get it more specific and say, ‘The sun shines from X angle,’ or ‘Give me a picture at dusk or 5 pm.’ It’ll look at the text and know that dusk and 5 pm may be the same things and then look at images associated with that.”
For advertisers, that kind of granularity without setting up multiple photo shoots could be a cost-effective way to adapt a campaign for different markets. That is, at least in part, because brands can use the multimodal image generator with their own assets, like blending in an existing product photo with a unique background, scene, character, or visual complement that wouldn’t always necessitate setting up a separate shoot to get similar results.
Marcus also notes there are safeguards in place to deal with copyright infringement and inappropriate content that will include invisible watermarks. These will confirm images were generated by AI, while the image generator will try to create different images whenever two users use the same exact prompt.
“In that sense, you still might need someone to take the photos but you just don’t need to necessarily put them on location,” she says.
“Brands want authenticity but they also really want to be able to do things quickly. I think generative AI is providing that capability to do the touch-ups much more quickly but I don’t see brands really wanting to get rid of that authenticity, that realism, that spirit of the moment that you get from doing live photography.”
An ‘Accelerant’ and Not a Replacement
In other words, it would be possible to set up photo shoots in a studio and then work on the visual elements within the image using tools like the Titan Image Generator. It’s not clear if Amazon received that kind of feedback from companies using its cloud and AI development services, but cutting costs is often a page in the corporate playbook when new technologies emerge.
Amazon believes the image generator will help all users at the enterprise level use the capability more safely and securely to help accelerate their business objectives — without “taking away individuals’ jobs.” It’s more of an “accelerant” to make them more productive and resourceful, she adds.
Plus, there are differences between product photography and event photography that show the limits of what the AI can do. It can’t generate images about a live event it knows nothing about, including those attending, much less do it in real-time like an on-site photographer could. And while a brand can utilize the AI to complement its own images, Titan can’t pull from a brand’s own image and product resources to create something that way. Copyrighted brands, logos, likenesses, and assets are untouchable that way. First, you take the original image and then plug in the AI to work with it.
“My experience is that brands will still want to be able to take live photos of their products and events in different ways, in a very realistic format,” says Marcus. “Many multimedia companies want to be able to have their own brand associated with content creation, so this is another tool in the toolbox to act as an augmentation.”
Titan Image Generator is currently available as a preview in the United States and only in English, though Amazon also announced that it is looking to expand availability to other markets and languages going forward.