We already reported about the AI-supported image generator DALL-E 2 and about the fact that it is only a matter of time until something similar will be available for videos. Now it&s almost time - sooner than expected: Heidelberg AI researcher Patrick Esser has shown a demonstration on Twitter of what text-based editing of video will soon be possible as part of the online video editor Runway, where Esser works. A short clip featuring a tennis player shows how the background of the tennis court is dynamically changed using only different text prompts.
Tennis court original and on the moon
For example, the background of the tennis court becomes a beach or the match takes place on the moon or in a fantasy landscape. In the demo clip, the dynamic cropping of the tennis player including shadow and ball seems to be done in real time, including tracking of the motion to move the background correctly, as well as the generation of a new background. It is not yet possible to judge how good the image quality will ultimately be due to the fairly low quality of the clip, but once the feature is implemented, various aspects of the algorithm can also be improved quite easily by optimizing the algorithm and adding more computing power.
This demo shows a still quite early stage of Machine Learning based functions - only the background seems to be changeable - but it already impressively shows what powerful functions will soon be possible by object based editing/generation of videos (be it like here with text commands or also by sketches).