.footer { } Logo Logo
deutsch
/// News
NAB 2022 interview: Blackmagic Design - User Senarios for the New Cloud Products

AI generates frighteningly accurate portraits - based on voice alone

[09:57 Sat,9.April 2022   by Thomas Richter]    

We were made aware of the extremely interesting AI project "Speech2Face" by an article on Petapixel, which was already published at the end of 2019. The algorithm can reconstruct the corresponding face with an often surprisingly high similarity based on a voice recording that is only about 4-6 seconds short.

S2F
Speech2Face




The deep neural network was trained on millions of videos showing people talking to each other - the faces appearing in the videos were recognized and the corresponding voices were analyzed by spectrogram. Based on these spectrograms, a face matching a particular voice is then selected during the search. In most cases, the longer the voice sample, the greater the similarity of the face (6 seconds yields significantly better results than 3 second samples.

Thus, the Deep Learning algorithm independently learned correlations between the sound of voices and the appearance of the speaker. Based on this, the algorithm then estimates the speaker&s age, gender, and other features and generates a matching face.

S2F-Modell
Speech2Face algorithm


To further assess the performance of the AI and compare the real face with the generated one, a standardized image of the face from the frontal view with identical lighting of a speaking person was also synthesized from the videos. And here, too, an often astonishing similarity of the real faces with the faces generated via Speech2Face can be seen, far beyond the matching age and gender.

S2F-Better-Faces
Speech2Face


However, there were also a number of cases where generated face differed greatly from the original face of the speaker in terms of age, gender or ethnicity. In the latter case, especially when a person does not speak in one of the languages of the respective (apparent) ethnicity.

S2F-Fails
Speech2Face problems


The researchers themselves therefore also caveat that although their model reveals statistical correlations between facial features and voices of speakers in the training data, it does not represent the entire world population due to the training data used (mainly a collection of educational videos from YouTube) and the model is influenced by this uneven distribution of the data. Therefore, they recommend that any practical application of the method use training data representative of the intended user population.

Use cases would be, for example, the automatic generation of avatars matching a voice (even stylized as a cartoon) in cases of online conversations where only the sound is available. Likewise, computer-generated voices of virtual assistants, for example, could be given a face via Speech2Face. Just as well, however, a phantom image of an extortionist could be created in the context of police investigations, for example, of which only a voice recording exists.

S2F-Cartoon
Speech2Face cartoon faces


We are not aware of any further development of the Speech2Face algorithm, but if one is published it will probably be much better than the now more than 2 years "old" method due to the enormous progress in DeepLearning.

As is often the case with DeepLearning algorithms, there is a danger that the algorithm&s "estimate" based on a lot of training data - as good as it usually is - will be taken for true without question. It is similar with AI Superresolution methods, which significantly increase the resolution of blurred images and then reveal things that are not "true" per se, but simply very probable.

Link more infos at bei speech2face.github.io

deutsche Version dieser Seite: KI generiert erschreckend exakte Portraits - nur anhand der Stimme

  



[nach oben]












Archiv Newsmeldungen

2025

May - April - March - February - January

2024
December - November - October - September - August - July - June - May - April - March - February - January

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: KI generiert erschreckend exakte Portraits - nur anhand der Stimme



last update : 8.Mai 2025 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version