This AI-Powered, Point-Based Photo Manipulation System is Wild
Researchers have developed a point-based image manipulation system that uses generative artificial intelligence (AI) technology to allow users to precisely control the pose, shape, expression, and layout of objects.
The research outlines how users can control generative adversarial networks (GANs) with intuitive graphical control. The technology is called DragGAN.
Similar to how U Point technology works within DxO software, allowing users to drop a point on part of their image and affect the look of the relevant pixels, DragGAN enables users to drop a point on an image and change the organization and very existence of individual pixels, not just the brightness and color. DragGAN uses AI to generate brand-new pixels in response to user input.
“In this work, we study a powerful yet much less explored way of controlling GANs, that is, to ‘drag’ any points of the image to precisely reach target points in a user-interactive manner, as shown in Fig.1. To achieve this, we propose DragGAN, which consists of two main components including: 1) feature-based motion supervision that drives the handle point to move towards the target position, and 2) a new point tracking approach that leverages the discriminative GAN features to keep localizing the position of the handle points,” explain the researchers.
When editing images of diverse subjects, including animals, vehicles, landscapes, and even people, users can “deform an image with precise control over where pixels go, thus manipulating the pose, shape, expression, and layout,” explains the researchers.
The Verge describes DragGAN as, “Like Photoshop’s Warp tool, but far more powerful. You’re not just smushing pixels around, but using AI to re-generate the underlying object.”
GANs are increasingly competent when it comes to generating realistic outputs. However, DragGAN introduces a distinct level of control over pixel location that typical GANs don’t offer.
It’s extremely powerful to manipulate a two-dimensional image in an AI-generated three-dimensional space. Examples show a user changing the pose of a dog, adjusting the height and reflections of a mountain behind a lake, and making extensive changes to the appearance and behavior of a lion.
The team also highlights that DragGAN’s appeal goes beyond its power and capabilities. The user interface is noteworthy because it’s straightforward, and almost any user could take advantage of the technology without understanding the underlying technology. Many AI tools can be obtuse to new and inexperienced users, which significantly limits commercial and practical appeal.
“As these manipulations are performed on the learned generative image manifold of a GAN, they tend to produce realistic outputs even for challenging scenarios such as hallucinating occluded content and deforming shapes that consistently follow the object’s rigidity. Both qualitative and quantitative comparisons demonstrate the advantage of DragGAN over prior approaches in the tasks of image manipulation and point tracking,” say the researchers.
The full paper includes a detailed description of DragGAN, including parts of its code and its mathematical underpinnings. The paper also includes research results and shows the efficacy of DragGAN. For example, DragGAN includes a masking function that allows users to mask specific regions of an image to affect a user-selected region of pixels.
An example outlined in the paper shows what happens when a user tries to drag a dog to change the orientation of its face. Without a mask on the dog’s head, the dog’s entire body is rotated. However, when using a mask within DragGAN, users can precisely control just the face, resulting in more granular control.
Another example of DragGAN’s power involves a lion. In the example below, the original image has a lion with a closed mouth. However, users can drop points on the top and bottom of the lion’s muzzle and then move them to open the lion’s mouth. DragGAN generates new pixels for the inside of the lion’s mouth, including realistic teeth.
“We have presented DragGAN, an interactive approach for intuitive point-based image editing. Our method leverages a pre-trained GAN to synthesize images that not only precisely follow user input, but also stay on the manifold of realistic images. In contrast to many previous approaches, we present a general framework by not relying on domain-specific modeling or auxiliary networks,” the team concludes.
“This is achieved using two novel ingredients: An optimization of latent codes that incrementally moves multiple handle points towards their target locations, and a point tracking procedure to faithfully trace the trajectory of the handle points. Both components utilize the dis- criminative quality of intermediate feature maps of the GAN to yield pixel-precise image deformations and interactive performance. We have demonstrated that our approach outperforms the state of the art in GAN-based manipulation and opens new directions for powerful image editing using generative priors. As for future work, we plan to extend point-based editing to 3D generative models.”
The research team comprises Xingang Pang, Thomas Leimkühler, and Christian Theobalt of the Max Planck Institute for Informatics; Ayush Tewari from MIT CSAIL; and Abhimitra Meka from Google AR/VR.
Image and demo video credits: Pang, Leikmühler, Theobalt, Tewari, and Meka / Max Planck Institute for Informatics, MIT CSAIL, and Google AR/VR.