[15:37 Sun,13.October 2024 by Thomas Richter] |
The range of AIs that generate high-quality video is currently growing at an incredible pace. However, the new Pyramid Flow model by Chinese researchers stands out because it has been released under the Open Source MIT license, meaning it can be used for free—even commercially. Commercial video AIs, on the other hand, can become quite expensive to use when generating a large number of clips. ![]()
This puts it in competition with other Open Source models like OpenSora and CogVideoX, but it offers several advantages over them. For example, Pyramid Flow&s video resolution of up to 1,280 x 768 pixels, the length of clips at 10 seconds, and especially the frame rate of 24fps, are significantly better. Moreover, the best CogVideoX model-5B is only available under a restrictive special license. The most important aspect, however, is of course the image quality of the generated videos. You can get a rough idea of this by looking at the demo clips on pyramid-flow.github.io/, which feature some of the now-classic themes from OpenAI&s Sora, such as the astronaut with a wool hat or the waves crashing against a cliff with a lighthouse. Subjectively, the examples shown look very good, but of course, they were selected by the researchers from multiple attempts. The developers have also published the results of a comparison of Pyramid Flow&s image quality with other current models, both in the form of test scores and a 1-on-1 shootout where Pyramid Flow competes directly against another video AI. In the latter, it clearly outperforms OpenSora in terms of aesthetics, movement, and prompt interpretation, and it beats CogVideoX-5B Kling (though likely the now-outdated version 1.0) in at least two areas. However, we are eagerly awaiting more thorough independent tests comparing Pyramid Flow with CogVideoX as well as the best current commercial video AIs such as Meta&s Movie Gen, Kling, MiniMax, Runway Gen3, and Sora. ![]() Interestingly, Pyramid Flow was developed by researchers from Peking University as well as from Kuaishou Technology, the creators of Kling. The new Pyramidal Flow Matching Algorithm could be important for the future development of (Open Source) video AIs, as it is very efficient in terms of the computing power required both for training and generation. The model operates in multiple steps, with full video resolution only being applied in the final step. This new approach reduces the tokens required by a quarter compared to previous models. Pyramid Flow was trained using open-source datasets of annotated videos over 20,700 A100 GPU hours. ![]() deutsche Version dieser Seite: Pyramid Flow - Neue Open Source Video-KI generiert mit 1280 x 768 und 24fps |
![]() |