The basis for the image generation of the above clip is the open image generating AI Stable Diffusion developed by AI researchers Patrick Esser and Robin Rombach, which has been just launched as a closed beta. The most exciting thing is that Stable Diffusion will soon be released as software that runs on consumer PCs. A graphics card with at least 5 GB VRAM is sufficient to generate 512x512 pixel images in a few seconds.
Image generated with Stable Diffusion
This way, the generation of images via text, which was previously only accessible to a rather narrow circle of people (like DALL-E 2 and Midjourney), can be experienced by many more people, who can experiment with it themselves without any restrictions.
In this context, the statement of David Holz, CEO of the image AI Midjourney in a interview about the costs is also interesting. According to him, a training run to read in a pool of billions of images costs around ,000 and usually has to be repeated 10-20 times to get a satisfactory result. Once this training data, and thus the model, is created, much less computing power is needed to do the actual job, i.e., generate images.
Obi-Wan Kenobi eating only one cannoli
This asymmetry is typical for neural networks, where a lot of computing power has to be put into the training, but the resulting specific model, i.e. the neural network with all its weights of the individual nodes, then requires much less power. No wonder - for example, the public Laion B5 Dataset used for Stable Diffusion consists of a 240 TB collection of 5 billion images including multilingual image descriptions (which can be searched here), which has to be read in and learned for a training run. Anyway, we are excited about the first self-generated images and look forward to experimenting with them. Until then here a nice comparison of some current image generating AIs and their particular styles and namely DALL-E 2, Stable Duffusion, Crayon, Midjourney and DALL-E Flow. more infos at bei stability.ai