[14:44 Wed,23.September 2020 by Thomas Richter] |
The new DeepLearning Algorithm "Wav2Lip" of an Indian research team can match the lip movements of a speaker to the words from any audio recording. It beautifully demonstrates the continuous progress that machine learning technology is making, as the new method delivers significantly better results than older projects. Not only does it work in real time, but - and this is the real progress - it is also more universal, because it can handle any face, any language and any voice.
And last but not least, this technology could also help in principle to make it easier to use the voices in the post by overdub instead of the original sound in scenic productions. Even minor speech errors (which would otherwise render a scene unusable) could be easily corrected by briefly "tracking" the lips automatically. Using deep learning algorithms, it would also be conceivable to automatically offer different language versions of any clips, for example on YouTube. YouTube already provides an automatic transcription, and the next steps are already possible using algorithms: the translation of the transcribed text into another language, speech synthesis with the voice of the original voice, and then lip-syncing the video with the new audio. Of course, the technology can also be misused to generate clips in which people seem to be saying things they never said - the new audio can also be generated via neural network to mimic the real voice. How good the Wav2Lip algorithm is, anyone can try it out for themselves on the project&s ![]() ![]() Bild zur Newsmeldung:
![]() deutsche Version dieser Seite: KI synchronisiert Lippenbewegungen mit Audio in Echtzeit |
![]() |