How to produce a SF clip in just 3h using AI tools

[11:34 Fri,24.March 2023 by Thomas Richter]

The following short experimental video by designer Nick St. Pierre nicely demonstrates how AI tools can be used in the field of video. He used several different AI tools to produce the clip, namely ChatGPT, the image AI Midjourney as well as the (so far only closed beta) accessible Runway Gen 1 and the music AI Boomy.

Nicely, he explains in a Twitter thread, which we will reproduce here, how he uses the different AIs in interaction. In the beginning, the "script" was developed via ChatGPT. The task was "Write a script for a 9-second video consisting of three 3-second clips. The story should feature a man in his living room and have a science fiction theme."

ChatGPT&s script.

Then a reference image was generated using the image AI

Midjourney to produce a template for the intended visual look and feel, which would then be applied to the entire video via style transfer, the associated prompt translated as "sci-fi film still, medium shot, centered, side view, a man sitting in a chair, holding a glowing orb in his hands, living room, new york, 4k --ar 16:9."

The reference image of Midjourney

In the next step, St Pierre then filmed very roughly recreated the scenes described in the script with his iPhone. To facilitate the style transfer, he tried to recreate the pose of the midjourney image. His own video acts as a reference for generating the AI video, similar to how image generation via stable diffusion uses a sample image via

ControlNet to roughly define the image composition and thus enable a relatively exact implementation of very specific arrangements in space or poses.

Before and After

This self-shot clip, along with the image from Midjourney, then served as the basis for transferring the image style to the entire video using the

Runway Gen 1 video AI from the creators of Stable Diffusion. Alternatively, the new image style could be described by text prompt, but the reference image gives more fine-grained control - at least if you have enough experience to know how to achieve the desired goal in Midjourney using the right prompt.

The reference video you shot yourself:

The next step was just editing the three individual clips together in iMovie. Then all that was missing was the soundtrack - this was created by St Pierre with the help of the sound AI

Boomy, which generates music based on a style and mood set by the user. The last thing to do was to combine the music with the video and export the whole thing - the clip was ready. The whole thing took about 3 hours.

Of course, the small clip is just a proof-of-concept to show how different AI tools could be used in a real film production, if the quality of the AI generated results is good enough. Of course, in a real clip production, more time would be invested in fine-tuning the resulpts to achieve a consistent look.

Depending on the AI tool used, it will still take more or less time to get production-ready results right off the bat. But how fast the development of AI tools is progressing at the moment can be nicely seen by this comparison of the output of the different versions of the image AI Midjourney in only one year - the video AIs will probably improve similarly fast:

Midjourney Evolution

Video generated entirely by AI

.
Very soon it will also be possible, for example via

Runway Gen2, to generate such scenes by text description alone, just as it already works with the image AIs.

more infos at bei twitter.com

deutsche Version dieser Seite: Wie man einen Videoclip mit KI-Tools produziert - in nur 3 Stunden