[13:32 Mon,17.March 2025 by Thomas Richter] |
Wouldn´t it be practical to be able to edit images simply with text commands and thus carry out complex edits very easily? This is exactly what Google&s latest AI model, Gemini 2.0 Flash (Image Generation) Experimental, makes possible, because it can not only generate images in a few seconds, but also upload any images and edit them object-oriented via text prompt. This works because it is a multimodal model, i.e. it "understands" images and texts and can therefore correctly interpret complex prompts and relate them to the image content.
![]() Gemini UI Many possibilities of editing with AIThe possibilities are huge, thanks to the generative power of Gemini in the background: For example, you can not only selectively remove objects from the image without a trace, but also add them. And since the editing works simply by description, you don´t have to laboriously mark an object by hand with a mask, but it is enough to mention the desired object - be it a concrete object or an image element such as the background - in the prompt. And there are many other possibilities: For example, an object in the image can be specifically changed via prompt and, for example, its color can be changed or its appearance, such as hair color, hairstyle, clothing, posture or facial expression. ![]() Cat replaced by dog Two images can also be combined in a way defined by the user. For example, a person in a photo can be given an object from a second photo in their hand or an item of clothing can be exactly replaced. You can easily color old black and white photos: ![]() BW to colorful It is also possible to use image creation in a chat with Gemini 2.0 and thus have the AI answers to your own questions supplemented with suitable images, quasi for supplementary illustration. Also interesting is the possibility of displaying objects - including faces - from a different perspective via prompt. An old problem of other generative models also seems to be solved: Gemini 2.0 can also correctly display longer texts in images. ConsistencyAnother special feature of the new model is the ability to consistently represent objects across multiple images, each in a different context or from a different perspective or state. For example, image series or stories can easily be created. However, the consistency of an object suffers if it is continuously edited in an entire command chain, as in the following example: However, the new Gemini model is still experimental, i.e. it can do impressive things, but the resolution is quite low at 1,024 x 673 pixels, image errors also sometimes occur and the prompts are not always interpreted correctly, but as always in matters of AI, "it&s a work in (fast) progress". Here are more examples: Weitere Bilder zur Newsmeldung:
![]() deutsche Version dieser Seite: Neue kostenlose Google KI - Bildbearbeitung nur per Prompt |
![]() |