Logo Logo
directory schraeg
Camcorders· Cinema-Kamera· Computers· Drohnen· GPU· Kamera-Zubehör· Video-DSLR· accessories
Compositing· Color correction· DV Editing

Shooting· Sound
Forschung· KI· Reviews· Streaming
/// News
OpenAI VALL-E: New AI mimics any voice - using only 3s voice sample

OpenAI VALL-E: New AI mimics any voice - using only 3s voice sample

[16:42 Mon,9.January 2023   by ]    

There are already for a long time various DeepLearning algorithms that can deceptively imitate the most diverse voices - however, until now a more or less long recording of the original voice was always necessary. OpenAI, known among others for the image-generating AI DALL-E 2 LINK, has now introduced a related AI for the generation of voice recordings. The great innovation here is that this requires only 3 seconds of recording the voice to be imitated as a prompt, and then outputs arbitrary text that sounds as if spoken by that voice.


This is possible due to a large amount of voice recordings VALL-E has been trained on, about 60,000 hours of recordings of about 7,000 different voices in English - since the variations of different voices range within a certain spectrum, when a new voice is to be simulated, VALL-E can simply draw on the learned knowledge of similar voices (and their different characteristics) and thus synthesize the new voice that way. Interestingly, VALL-E uses a neural audio codec to compress the voices.

Laut OpenAI zeigen die Versuchsergebnisse, dass VALL-E vergleichbare TTS-(Text-to-Speech) System in Bezug auf die Natürlichkeit der Sprache und die Ähnlichkeit der Sprecher deutlich übertrifft. Außerdem kann VALL-E die Emotionen des Sprechers und die akustische Umgebung des akustischen Prompts in der Synthese weitestgehend bewahren. Zudem kann die Sprachausgabe von VALL- E bei gleichem Eingabetext variieren, und so also eine Vielzahl leicht unterschiedlicher personalisierter Sprachproben synthetisieren.


There are many more examples at VALL-E&s website.

Many possible applications for voice synthesis

The opportunities of the new technology are as enormous as the risks - due to the only very short voice samples required by VALL-E, its field of application expands significantly once again. It is already possible, for example, when dubbing movies in another language, to use the original voice of the respective actor for a text in another language via speech synthesis.

Personal assistants such as Siri or Alexa could also communicate with the user using the voices of any other person, or text messages (whether SMS or Whatsapp) could be read out in the voice of the respective sender. A very practical use is for people who have lost their voice due to a disease (such as people with ALS). They could then talk to others by text input with their own voice - provided of course that old training material of the voice exists.

Neural Audiocodec

The danger of manipulation using fake voice

The possibilities for misuse of a voice simulation by VALL-E using very short samples are of course also great - for example, voice recordings could be faked at will in order to discredit someone - be it a well-known politician or a private person - or to put false information into circulation. Likewise, automated advertising calls could be made using the voice of one&s own mother or friend, or an even more convincing version of the infamous grandchild trick shock call could use the voice of the actual grandchild - which could be deceptively simulated using only a short decoy call.

Link more infos at bei valle-demo.github.io

deutsche Version dieser Seite: OpenAI VALL-E: Neue KI macht jede Stimme nach - nur anhand von 3s Stimmsample


  Vorige News lesen Nächste News lesen 
bildAsus: Neue 16" OLED Notebooks mit 3D ohne Brille und trotzdem voller 3.2K Auflösung bildCinecred: Professionelle Film-Abspänne einfach per kostenlosem Tool erstellen

related news:1E0New Samsung Galaxy S23 Ultra smartphone: 8K, 200MP sensor, AI night mode and improved autofocus 3.February 2023
New audio AI generates any sound effects in addition to music 2.February 2023
First pictures, then sounds: New Google AI generates arbitrary music according to text description 30.January 2023
New NVIDIA Eye Contact AI Effect: Bye bye Teleprompter? 25.January 2023
The total remix: New AI Tune-A-Video makes new videos from old clips 24.January 2023
Big comparison test: upscaling via AI - which tool is best? Free or Paid? 23.January 2023
New NVIDIA Video Super Resolution AI scales video from 1080p to 4K 6.January 2023
alle Newsmeldungen zum Thema Machine Learning

[nach oben]

Archiv Newsmeldungen


February - January

December - November - October - September - August - July - June - May - April - March - February - January























deutsche Version dieser Seite: OpenAI VALL-E: Neue KI macht jede Stimme nach - nur anhand von 3s Stimmsample

last update : 4.Februar 2023 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version