For months now, so-called generative AI models (generative adversarial networks, or GANs for short) have been showing us that computers can create astonishingly realistic images of the world around us. However, this does not involve rendering in the sense of a classic 3D representation.
Instead, billions of images are compressed and stored in a so-called latent space. And they are stored in such a way that they are close to each other in different dimensions due to their similarity. This extremely multi-dimensional storage is hard for humans to imagine - although or we may even "manage" our knowledge in our brains with similar schemes. For example, all smiling people are close to each other in a dimensional axis.
To generate an AI image, we "only" need to specify its coordinates in this space. And these coordinates correspond - simply said - to the terms in the corresponding prompt. Even before, we could use prompting to try to move along these dimensional axes in latent space to change only small things in the output. "Negative Prompting" also takes advantage of this idea.
What hasn&t worked yet, however: Moving image areas directly through the mouse. So, for example, you simply pull up the corner of the mouth by touching it with the mouse. In contrast to simple morphing, "Drag your GAN" changes the entire object to match. Thus, the lips can possibly open a bit, wrinkles can be added and/or the eyes can close a bit.
However, the whole thing does not work with a conventional photo. Because of the necessary latent space, the manipulation can only be done with an image generated from this by the AI. But afterwards the manipulation possibilities are easier than ever before. A large number of animated examples are available on the project website.
"Drag your GAN" thus represents the next milestone in the rapid development of generative AI models. The corresponding code is to be made available as early as June.
Bild zur Newsmeldung: