Alibaba Unveils An Open Source, Text-Based, AI Photo Editor

A central bear wearing a "Qwen" shirt is surrounded by smaller bears in various outfits and activities, such as painting, playing guitar, reading, baking, singing, and graduating, each expressing different emotions.

China-based e-commerce giant Alibaba’s AI research team has released an open-source, text-based AI photo editor that the developers claim brings “state-of-the-art” image editing performance that supports both English and Chinese prompts.

Developed by Alibaba’s Qwen AI research team, Qwen-Image-Edit was made to support the ability to add, remove, and modify areas of an image while leaving other regions of it unchanged. It was also designed to perform higher-level edits, such as rotating an object, transferring a style from one image to another, and more. The tool supports the ability to also work with text, giving users the ability to add, remove, or modify text in images while retaining the original font, size, and style.

The promises are lofty. Not only does Qwen-Image-Edit promise the ability to handle these complex editing orders, but it can do so with a mix of text prompts, from simple to complex, and is also capable, supposedly, of handling various types of image inputs, from drawings to photos, while maintaining the original intent of those images.

A slide titled "Image Editing in Novel View Synthesis" shows pairs of images: a man, a cow, a red car, and a person in a hat, each with their input image and a corresponding front view generated.

A collage shows input images of a baby, a dog, a crow, and a lion, each paired with an edited image displaying a different angle: left, right, or back views. Title: “Image Editing in Novel View Synthesis.”.

The research team even asserts that the tool has the capability to remove fine hair strands or other small objects from an image without meaningfully changing it otherwise, which is exactly what tools like the Clone Stamp or Healing Brush do in Adobe Photoshop.

Two images side by side show a plate of pasta on a menu. The left image has a strand of hair on the plate, while the right image shows the plate with the hair removed.

The tool is available to try with Alibaba’s Qwen chat, which is the company’s competitor to ChatGPT. When the Image Edit prompt is selected, users can upload their own, but it also provides a list of example edits to run. The one below is one such option, and PetaPixel did confirm that the tool processes it identically. That said, it could simply be generating the same response for all users since it is one of the research examples.

Four images show a woman edited with virtual try-on: original in a black polka-dot blouse, in a light purple dress, in a light blue shirt, and with a beige beret plus a dark polka-dot blouse.

Anyone can try out a small number of prompts for free, but additional attempts will require a paid membership. As noted by Venture Beat, the model is open source under the Apache 2.0 license, so it is possible for companies to set up the model on servers and run it internally. It is, otherwise, available as part of Alibaba Cloud Model Studio for $0.045 per image or through the aforementioned Qwen Chat.

PetaPixel’s Take

While Qwen-Image-Edit does obviously work for many users, it is inconsistent. In PetaPixel‘s testing, none of the original prompts provided to the platform resulted in worthwhile edits. In both cases, the program failed to make small edits to the image, instead redrawing it completely as a new, generated AI image.

A bear stands in a forest clearing, a man kneels by a rocky stream holding a fish, and another man poses by the water’s edge with a fish, all surrounded by lush greenery.

A brown bear stands on mossy rocks by a flowing river, surrounded by lush green trees in a sunlit forest.

A smiling person in gray outdoor clothing kneels by a shallow, forested river, holding a large fish with both hands just above the water. Trees and rocks surround the calm stream.

These results are not usable.

Additionally, PetaPixel ran one of the built-in requests that asks Qwen-Image-Edit to take an old, black-and-white, degraded photo and update it. Below is the prompt original image:

A black-and-white photo of an old, two-story industrial building with metal pipes, a fire escape, and a wrought-iron fence in front. Snow patches and bare trees are visible along a railway track.

And below is the Qwen-Image-Edit processed image:

A two-story, beige building with peeling paint stands behind black iron fencing near railroad tracks, under a clear sky. Metal pipes and structures are visible around the building, and leafless trees grow nearby.

At a glance, it appears very successful. However, at close inspection, the image is full of AI artifacts that just don’t hold up to scrutiny. Edges and colors look muddy and imprecise, while whole sections of the recolored image have that “blotchy” AI-generated look that isn’t close to what a real image would look like.

So while it is possible to get some edits that are passable with this tool, it is also possible to get wholly unusable ones. For now, tools like Photoshop are still the best way to go, despite how Venture Beat describes it.

Discussion