AI development never sleeps - while we are still working on various new tools presented at NAB for the news section, we would like to show you a few clips that demonstrate the current state of generative video AI in a very entertaining way. They were created with Runway Gen2 and still show the typical AI weaknesses such as phantom limbs, but were produced in a short time and for little money:
The trailer for "The great Catspy" consists of about 65 short individual clips, which were created with Runway Gen-2 via text prompts and show an amazing consistency among each other. However, many more clips were generated, over 500 - many were discarded, some were reworked. The filmmaker - Visiblemaker - based the selection of shots and editing in Premiere Pro on the music. The voices were synthesized with Eleven Labs, and ChatGPT helped with the script. The SoundFX were added by human hand, it seems. The film creation is said to have taken only two days - without any camera or animation.
Except for the editing/transitions of the different sequences and the typography, everything in this commercial for the fictional pizza service "Pepperoni Hug Spot" was generated by AI tools. The content and text (including the catchy title) were virtually penned by the GPT4 voice model, while the artwork was created based on its prompts by Midjourney, after which it was converted to motion in Runway Gen2. For the artificial voice of the voice-over Eleven Labs was used, the music comes from SOUNDRAW AI Music, a service that allows to generate background music by AI.
The deformed faces of the pizza eaters leave no doubt about their AI-generated origin - as in The Great Catspy - and the lyrics are always slightly off, but this even creates a special charm. And "It's like family, but with more cheese" is actually a damn good advertising claim - by popular demand, the maker of the clip is already offering T-shirts with the slogan.
Pizza Magic according to Runway AI.
So you can see very well what is already possible with the current tools - no perfect, photorealistic results, but quite "original" videos. Due to the trick of the bad VHS look, the errors are also less noticeable in this case, for example when instead of the pizza, which sometimes looks like a giant bruschetta, the plate is nibbled on. Once the generative image and video AIs have native integration with GPT4, it should no longer even be necessary to insert typo elements manually; however, the image AIs are still very clumsy when it comes to type. The assembly of individual video snippets on the basis of a cut list - automatically created by GPT4 - should also be able to be done at the push of a button in the foreseeable future.
How many different versions of the texts, images and videos were created and possibly discarded in each case to arrive at the final result, however, would still be the question - and will probably remain so in the longer term. As human-in-the-loop, perhaps this is precisely the task we will have to perform when producing together with AI systems: that of selecting the various elements.
Not much is known about the creation of the third video ( source: Uncanny Harry on Twitter), except that it was also created with the generative video AI Runway Gen2 (which is still in closed beta). So whether only text prompts were used here or images as well is unclear.
Now, if you think the clip just looks like a video game, keep in mind that very elaborate game engines were previously required for such images. How realistically Runway (or other generative video AIs) will succeed in depicting humans in the future remains to be seen; we're bracing ourselves for anything.
By the way, the song in the background was also created with AI tools, but as far as the singer's voice is concerned: you can hear a fake Kanye West. His voice was simply synthesized and inserted instead of the original one.