Milestone in AI video consistency? MagicVideo-V2 with TikTok proximity
[14:53 Sat,13.January 2024 by Rudi Schmidts]
ByteDance, the parent company of TikTok and Douyin, has unveiled a second version of MagicVideo after around a year. This is an AI model for video generation, which is now set to outperform competitors such as Pika 1.0, Runway and Stable Video Diffusion-XT.
Various input methods can be used for generation: There is a text-to-image model, an image-to-video model (which can use a text prompt and the previously generated image to generate keyframes), a video-to-video model that refines keyframes and extrapolates with super-resolution. At the end there is still a video frame interpolation model, which smoothes the video movement through frame interpolation.
The latter seems particularly interesting, as ByteDance emphasizes the outstanding ability of MagicVideo-V2 to produce high-resolution videos with improved fidelity and smoothness. In short: improved consistency over time.
According to the rather short published paper, in a direct visual comparison with real people, MagicVideo-V2's results are said to have outperformed leading text-to-video systems such as Runway, Pika 1.0, Morph, Moon Valley and Stable Video Diffusion.
You can't help but wonder whether ByteDance was able to use its extensive data pool from the numerous existing TikTok videos for this. And of course, it can be strongly assumed that this is the case.
This in turn should mean that ByteDance can and will be at the forefront of AI video generation models in the future. So far, we have mainly seen Google at the forefront here, which, with YouTube, has at least as large a data pool of moving image data.
There is no doubt that AI video generators will become an extremely large market - and ultimately change the way we produce videos and films forever.