Powerful Object Editing: Netflix VOID – New Free AI Tool for Removing Objects from Videos

[16:13 Tue,7.April 2026 by Thomas Richter]

Researchers at Netflix have released an AI algorithm called VOID (Video Object and Interaction Deletion), which not only removes objects from videos without a trace but—and this is the clever part—also removes all consequences of their physical interactions. This enables simpler yet more powerful object editing than previously possible, as the new algorithm "understands" how a sequence of events changes when an object is removed. If an object is removed, its previous interactions with other items are also removed from all frames.

In this regard, VOID goes a step further than previous similar methods for object-based editing: while classic methods convincingly reconstruct the background behind a removed object and correct visual side effects like shadows or reflections, they do not function on a higher level—i.e., when the removed object causally interferes with the action.

For example, if a ball is removed from a scene where it previously knocked into another object, it is not enough to simply make the ball disappear. If the editing is to be truly object-oriented, the movement of the impacted object must also change. This is exactly where many existing algorithms reach their limits: they create a visually cleaned-up scene but leave behind physically caused movements or secondary effects, which makes the scene look strange.

VOID, on the other hand, can effectively regenerate the affected scenes in a version that shows how the scene would have developed if the removed object had never been present and the interactions seen in the original video had not taken place. The focus thus shifts from pure inpainting to a reconstruction of altered dynamics throughout the entire scene. To achieve this, VOID must regenerate the entire scene based on an understanding of the physical interactions of the objects in the video. For instance, if a bowling ball is removed from a video of bowling pins (as seen in the

demo on the project page under "Results"), the pins are no longer knocked over by it.

Combination of language-image understanding and video diffusion

This high level of complexity is achieved by combining several model components—first, a Vision-Language Model (VLM) analyzes the scene and identifies the regions that are causally affected by the object to be removed. This includes, for example, objects that would not have fallen, would not have experienced a collision, or would have taken a different trajectory without the removed item.

These affected areas are then encoded in the form of a "quadmask." A quadmask is a grayscale mask with four pixel values (0/63/127/255) that signals to the inpainting model how much it should intervene per region—from "completely remove" to "do not touch." This mask serves as control information for a video diffusion model, which then generates the new, physically consistent video after the object removal. More information can be found in the associated

paper.

Operation is very simple: objects to be removed from a video are marked with a click. Netflix has released VOID freely, meaning it can be used by anyone, for example

via Huggingface—however, it requires a GPU with at least 40 GB of VRAM, such as an Nvidia A100, which can be rented on Huggingface. It can also be tried out via

demo.

Comparison with other tools

There are other tools that can be used for object removal, such as Runway, Generative Omnimatte, MiniMax-Remover, and ProPainter. According to the Netflix developers, VOID is significantly superior to these alternatives. In a comparison with 25 people across several scenarios, VOID was preferred in 64.8 percent of cases, while Runway landed in a distant second place at 18.4 percent.

VOID as a powerful new tool

VOID is not yet suitable for professional video editing; it only works with video clips of a few seconds, and the resolution is also too low. But as we know from the development of such AI algorithms so far, it won&t be long before the prototype becomes a functional tool that also consumes less memory.

In the future, it will be integrated into editing or compositing programs, enabling even more powerful object editing than before and drastically reducing the manual effort required for the post-production of such scenes—especially when the removed objects have influenced other elements of the scene.

more infos at bei void-model.github.io

deutsche Version dieser Seite: Netflix VOID – neues kostenloses KI-Tool zum Entfernen von Objekten aus Videos