Nvidia Perfusion - Easily add people and objects to AI models

[11:44 Tue,8.August 2023 by Rudi Schmidts]

And yet another major advance in generative AI research that Nvidia wants to and will officially present at Siggraph 2023: "Perfusion" is the name of a new text-to-image (T2I) individualization method that is supposed to make it particularly easy to "train" your own people and objects into an AI image generator.

Until now, it was only possible with special knowledge to extend AI models such as Stable Diffusion with own content. Using so-called Lora finetuning, for example, one&s own person can be introduced into the model in artificially created images via a text prompt. To do this, the network has to learn the additional person using example photos and the correct prompt, but until now this could not be done with a simple mouse click. In addition, the finetuning changes to the weights must somehow be brought into the model as a kind of patch, which sometimes involves very large data transfers for the new, changed weights in cloud applications.

According to Nvidia&s

Perfusion paper, all of this is supposed to become easier in several respects in the future. Thus, own objects should be able to be added on a single A100 GPU (with about 27GB memory consumption) in only 4 minutes. At the same time, the modification file with the changed weights is supposed to be just 100 kilobytes (!!, read correctly) small. The possibility to personalize a diffusion model for one&s own use cases should thus become easy for everyone in the near future.

Nvidia Perfusion

The application is also very simple. You just present some photos to the net and supply a text prompt describing who or what is to be seen in the pictures, directly followed by an asterisk (*).

This term with the star can then be easily used in the diffusion model with the other prompt words to describe the image. It should even be possible to "train" multiple objects in this way.

The key innovation in Perfusion is called "key-locking." In this approach, new concepts desired by the user, such as a particular cat or chair, are linked to a broader category during image generation. For example, the cat is linked to the general idea of a "cat." This technique allows for more precise matching, taking into account the specificity of the added trained objects in the representation of the general category. Thus, it can be assumed that, as a consequence, all cats will strongly resemble the added trained cat. What could complicate a training of several different cats or persons.

However, the broad, local application will be opposed by the required GPU memory size of 27GB, despite a timely release of the code. This is because Nvidia&s largest consumer GPUs currently only ship with a maximum of 24GB, which is just too small to try out Perfusion.

We had last pointed out exactly such upcoming problems in a

special SlashCAM article in April 2023. This is unfortunately more current than ever, with the only difference being that AMD is actually catching up mightily in software support. But Perfusion in particular will certainly only run on Nvidia&s cards for the time being...

more infos at bei research.nvidia.com

deutsche Version dieser Seite: Nvidia Perfusion - Personen und Objekte in KI-Modelle einfach einbringen