ConsiStory in Stable Diffusion - Finally consistent AI characters without fine-tuning?

[09:12 Wed,14.February 2024 by Rudi Schmidts]

Although the project website still says "anonymous authors", the arxiv.org/pdf/2402.03286.pdf (linked PDF paper) makes it clear that "ConsiStory" originates from Nvidia&s research facilities. This project addresses the problem that it is often difficult to use one or more characters consistently over several image generations. For example, an "old man with a hat" usually looks significantly different with each generation attempt, depending on the other prompt tokens. This is known as the current consistency problem of generative AI.

Until now, this problem has been tackled with so-called fine-tuning - in other words, an already trained AI model is "personalized" with additional images of one or more specific people. However, this is computationally complex and also requires a certain amount of expertise.

With ConsiStory, on the other hand, it should now be possible to generate consistent motifs across a series of images within Stable Diffusion XL (SDXL) without additional training. The researchers at Nvidia use a new feature for this, which they call "correspondence-based feature injection". ConsiStory should even be able to be extended to multi-subject scenarios and enable training-free personalization for common objects.

ConsiStory allows the use of consistent characters without fine-tuning

The lack of training means that such images can be created on a single Nvidia H100 in just ten seconds - which, according to the paper, is around twenty times faster than previous state-of-the-art methods.

Judging by the quality of the results published so far, Nvidia is likely to have achieved a small milestone in generative AI research here - because consistency in characters is one of the major problems currently "hanging" over many practical application scenarios for generative AI. And, of course, some rather unwanted AI projects, such as fully automated, virtual AI influencers.

Self-usable code to try out ConsiStory will be made available "shortly" on the

Github project page as a link for interested parties.

more infos at bei consistory-paper.github.io

deutsche Version dieser Seite: ConsiStory in Stable Diffusion - Endlich konsistente KI-Charaktere ohne Finetuning?