[13:41 Mon,20.March 2023 by Thomas Richter]
A Chinese research team has published a new text-to-video AI with 1.7 billion parameters, i.e. it can only generate video via text input. Similar algorithms have already been presented by Meta with Make-a-Video and Google with Imagen and Phenaki, but the special thing about VideoSAcope (here a demonstration on a Chinese website, translated to German by Google) is that the source code including the corresponding models is freely available for download. The new method uses for the generation of videos the well known diffusion method, which is also used by the well known AIs for image generation like for example Stable Diffusion.
Generating videos on your own PC.
Those who have some experience in configuring AI algorithms including models can try VideoFusion on their own PC - however, a high-performance GPU with at least 16 GB VRAM or 8 GB with half accuracy as well as 16 GB RAM is still a prerequisite. On an Nvidia RTX 3090 graphics card, generating a xxx second clip takes about 23s. Here is a manual for the installation on your own PC. The generated video is output as .MP4 file, for playback the open source free VLC Player is recommended.
Alternatively, there is already a implementation on the well-known AI portal Hugginface, which you can use yourself for free - Allersbergs generation there takes a relatively long time due to the shared GPU power. But if you need fast video generation, you can rent an Nvidia A10G GPU with 46 GB VRAM (or a more powerful A100 GPU) for around per hour.
Monkey learning to play the piano:
The researchers themselves restrict that VideoFusion cannot produce perfect movie and TV quality, nor can it produce text in the video. Only English is supported as an input language at the moment. It is also obvious that the image quality and resolution (128x 128) is still relatively low and reminiscent of the early days of image generation by AI - but experience in the field of AI shows how fast progress is there, especially if the associated program and model - as was already the case with Image AI Stable Diffusion - is available as open source and can thus be further developed and improved by anyone.
Robot dancing in times square:
At the moment VideoFusion also uses only 1.7 billion parameters - for comparison: DALL E2, which can only generate images, was trained with more than 10 billion parameters, so there is still a lot of room for further quality improvements just by using more parameters.
Shutterstock logo in generated videos
Copyright by Shutterstock?
It is noticeable that on a large part of the demonstrated videos of VideoFusion the logo of Shutterstock, the largest stock photo portal, is too prominently visible, indicating that their images were included in the image and video training material used ( LAION5B, ImageNet and Webvid).
It will be interesting to see how Shutterstock reacts to such videos, which are potentially legally contestable - on the one hand for unauthorized use of Shutterstock images as training material and on the other hand for unauthorized and potentially business-damaging use of the logo.
more infos at bei paperswithcode.com
deutsche Version dieser Seite: VideoFusion: Erste Open Source Video-KI ist da - und läuft auch auf dem Heim-PC