Logo Logo
directory schraeg
Camcorders· Cinema-Kamera· Computers· Drohnen· GPU· Kamera-Zubehör· Video-DSLR· accessories
Compositing· Color correction· DV Editing

Shooting· Sound
Forschung· Reviews· Streaming
/// News
AI generates frighteningly accurate portraits - based on voice alone

AI generates frighteningly accurate portraits - based on voice alone

[09:57 sat,9.April 2022   by ]    

We were made aware of the extremely interesting AI project "Speech2Face" by an article on Petapixel, which was already published at the end of 2019. The algorithm can reconstruct the corresponding face with an often surprisingly high similarity based on a voice recording that is only about 4-6 seconds short.


The deep neural network was trained on millions of videos showing people talking to each other - the faces appearing in the videos were recognized and the corresponding voices were analyzed by spectrogram. Based on these spectrograms, a face matching a particular voice is then selected during the search. In most cases, the longer the voice sample, the greater the similarity of the face (6 seconds yields significantly better results than 3 second samples.

Thus, the Deep Learning algorithm independently learned correlations between the sound of voices and the appearance of the speaker. Based on this, the algorithm then estimates the speaker&s age, gender, and other features and generates a matching face.

Speech2Face algorithm

To further assess the performance of the AI and compare the real face with the generated one, a standardized image of the face from the frontal view with identical lighting of a speaking person was also synthesized from the videos. And here, too, an often astonishing similarity of the real faces with the faces generated via Speech2Face can be seen, far beyond the matching age and gender.


However, there were also a number of cases where generated face differed greatly from the original face of the speaker in terms of age, gender or ethnicity. In the latter case, especially when a person does not speak in one of the languages of the respective (apparent) ethnicity.

Speech2Face problems

The researchers themselves therefore also caveat that although their model reveals statistical correlations between facial features and voices of speakers in the training data, it does not represent the entire world population due to the training data used (mainly a collection of educational videos from YouTube) and the model is influenced by this uneven distribution of the data. Therefore, they recommend that any practical application of the method use training data representative of the intended user population.

Use cases would be, for example, the automatic generation of avatars matching a voice (even stylized as a cartoon) in cases of online conversations where only the sound is available. Likewise, computer-generated voices of virtual assistants, for example, could be given a face via Speech2Face. Just as well, however, a phantom image of an extortionist could be created in the context of police investigations, for example, of which only a voice recording exists.

Speech2Face cartoon faces

We are not aware of any further development of the Speech2Face algorithm, but if one is published it will probably be much better than the now more than 2 years "old" method due to the enormous progress in DeepLearning.

As is often the case with DeepLearning algorithms, there is a danger that the algorithm&s "estimate" based on a lot of training data - as good as it usually is - will be taken for true without question. It is similar with AI Superresolution methods, which significantly increase the resolution of blurred images and then reveal things that are not "true" per se, but simply very probable.

Link more infos at bei speech2face.github.io

deutsche Version dieser Seite: KI generiert erschreckend exakte Portraits - nur anhand der Stimme


  Vorige News lesen Nächste News lesen 
bildAudio Design Desk: Bessere Integration in Final Cut Pro per Audio Bridge Erweiterung bildDie Workflows der Oscar-nominierten Filme - welche Technik nutzen die Profis und wie

related news:1E0See in the dark by AI with color 22.May 2022
Illuminated only by starlight: New AI algorithm denoises videos perfectly 16.May 2022
NAB 2022 Tutorialclip: DaVinci Resolve 18 - AI-Mask Objects, Depth Mask, Surface Tracker 27.April 2022
AI helps to synchronize motion pictures 16.April 2022
Apple Final Cut Pro update brings duplicate detection and voice isolation via AI 13.April 2022
DALL-E 2: AI generates and edits images based on text description only 10.April 2022
New AJA Ki Pro GO firmware 4.0 brings new playback options 3.April 2022
alle Newsmeldungen zum Thema Machine Learning

[nach oben]

Archiv Newsmeldungen


May - April - March - February - January

December - November - October - September - August - July - June - May - April - March - February - January






















deutsche Version dieser Seite: KI generiert erschreckend exakte Portraits - nur anhand der Stimme

last update : 24.Mai 2022 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version