.footer { } Logo Logo
deutsch
/// News
Crash Course in Image Design by Jeremy Vickery - A New Way to Think About Beauty in Art

OpenAI VALL-E: New AI mimics any voice - using only 3s voice sample

[16:42 Mon,9.January 2023   by Thomas Richter]    

There are already for a long time various DeepLearning algorithms that can deceptively imitate the most diverse voices - however, until now a more or less long recording of the original voice was always necessary. OpenAI, known among others for the image-generating AI DALL-E 2 LINK, has now introduced a related AI for the generation of voice recordings. The great innovation here is that this requires only 3 seconds of recording the voice to be imitated as a prompt, and then outputs arbitrary text that sounds as if spoken by that voice.


VALL-E-Overview


This is possible due to a large amount of voice recordings VALL-E has been trained on, about 60,000 hours of recordings of about 7,000 different voices in English - since the variations of different voices range within a certain spectrum, when a new voice is to be simulated, VALL-E can simply draw on the learned knowledge of similar voices (and their different characteristics) and thus synthesize the new voice that way. Interestingly, VALL-E uses a neural audio codec to compress the voices.

Laut OpenAI zeigen die Versuchsergebnisse, dass VALL-E vergleichbare TTS-(Text-to-Speech) System in Bezug auf die Natürlichkeit der Sprache und die Ähnlichkeit der Sprecher deutlich übertrifft. Außerdem kann VALL-E die Emotionen des Sprechers und die akustische Umgebung des akustischen Prompts in der Synthese weitestgehend bewahren. Zudem kann die Sprachausgabe von VALL- E bei gleichem Eingabetext variieren, und so also eine Vielzahl leicht unterschiedlicher personalisierter Sprachproben synthetisieren.

SampleSynthese


There are many more examples at VALL-E&s website.


Many possible applications for voice synthesis

.
The opportunities of the new technology are as enormous as the risks - due to the only very short voice samples required by VALL-E, its field of application expands significantly once again. It is already possible, for example, when dubbing movies in another language, to use the original voice of the respective actor for a text in another language via speech synthesis.

Personal assistants such as Siri or Alexa could also communicate with the user using the voices of any other person, or text messages (whether SMS or Whatsapp) could be read out in the voice of the respective sender. A very practical use is for people who have lost their voice due to a disease (such as people with ALS). They could then talk to others by text input with their own voice - provided of course that old training material of the voice exists.

VALL-E-Audiocodec
Neural Audiocodec



The danger of manipulation using fake voice

.
The possibilities for misuse of a voice simulation by VALL-E using very short samples are of course also great - for example, voice recordings could be faked at will in order to discredit someone - be it a well-known politician or a private person - or to put false information into circulation. Likewise, automated advertising calls could be made using the voice of one&s own mother or friend, or an even more convincing version of the infamous grandchild trick shock call could use the voice of the actual grandchild - which could be deceptively simulated using only a short decoy call.

Link more infos at bei valle-demo.github.io

deutsche Version dieser Seite: OpenAI VALL-E: Neue KI macht jede Stimme nach - nur anhand von 3s Stimmsample

  



[nach oben]












Archiv Newsmeldungen

2025

July - June - May - April - March - February - January

2024
December - November - October - September - August - July - June - May - April - March - February - January

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: OpenAI VALL-E: Neue KI macht jede Stimme nach - nur anhand von 3s Stimmsample



last update : 2.Juli 2025 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version