New Nvidia AI generates images 30x faster than Stable Diffusion

[16:10 Wed,8.March 2023 by Thomas Richter]

A team of researchers from Nvidia and the University of Tübingen has presented a new (old) method for generating images via AI, which has two major advantages over the previously used method. The new method, christened StyleGAN-T, is based quite classically on GANs (Generative Adversarial Network), which has long been used by Nvidia in the form of StyleGAN for several interesting applications, such as already in 2018 for the generation of arbitrary human faces or the artificial aging of faces.

This method, thanks to the new research, can now also generate images by prompt, much like current image AIs, such as DALL-E2, Midjourney, or the open source and therefore most widely used Stable Diffusion. These all rely on the latent diffusion model to generate images from text. In contrast, the new StyleGAN-T method, which has been christened StyleGAN-T, has two particular positive features: it is significantly faster and can morph better between the results of different prompts.

How fast StyleGAN-T is (and how well it can switch from almost smoothly one image to another) is well demonstrated by the following video, in which each 512 x 512 pixel frame was generated in 0.1 seconds - i.e. almost in real time - on a

Nvidia A100 Tensor Core GPU:

The difference in the speed of image generation also becomes clear in the following overview, a comparison with other image AIs such as

Stable Diffusion and Google&s

Imagen. Here, the new method is 37x faster than Stable Diffusion and over 90x faster than Imagen.

On weaker GPUs than the highly specialized Nvidisa NVIDIA A100, image generation runs slower than 0.1 seconds per image, of course, but the relative speed compared to images generated by diffusion should remain about the same.

The second advantage of generating images via GANs is their typical ability to explore latent space, i.e., interpolate between different results and thus visually morph between different results at the same or even different text prompts to make them blend seamlessly.

This general capability of (image) GANs is demonstrated quite nicely in the following clip from 2 Minute Paper:

Bild zur Newsmeldung:

more infos at bei sites.google.com

deutsche Version dieser Seite: Neue Nvidia-KI generiert Bilder 30x schneller als Stable Diffusion