The new episode of "Two Minute Papers" introduces a very interesting new DeepLearning project that can be used in several interesting ways for creative work, as it specializes in generating high-resolution photorealistic images.
Although Open AI&s GPT-3, which is based on Transformer network algorithms and has achieved some notoriety ( here some interesting examples of its universality), was already able to generate not only meaningful texts, but also images via image GPT, but these were still quite poor in quality (unlike the texts) with a maximum resolution of 192x192 pixels.
The new technique developed at the University of Heidelberg, on the other hand, just radically improves the capabilities of the neural Transformer Network with respect to images by combining a Convolutional Network (CNN), in that images are not only seen as sequences of pixels (as by a mere Transformer Network), but are first abstracted into visually meaningful (semantic) image components by a CNN before training with it, which makes images with much higher resolutions possible.
Image Generation by Depth Map
Based on the learned image material, the new algorithm can generate arbitrary matching images and 3D objects from simple depth maps as well as photorealistic landscape images from a schematic sketch (here sky, there water and mountains), conjure up detailed versions from blurred images via superresolution, generate photos of people assuming a pose given by a sketch or add the cropped half of photos.
Image generation of landscapes per sketch
As always, Károly Zsolnai-Fehér also documents the astonishingly fast development of the various AI algorithms by means of comparisons - what was possible more or less poorly a year ago is now already passable and one or two projects and further developments later already almost perfectly possible.