[10:07 Thu,18.July 2024 by Thomas Richter] |
Already 1 1/2 years ago, OpenAI released
![]() The new model of VALL-E 2 This is made possible by two important improvements in the system architecture: VALL-E 2 selects speech components more skillfully, avoiding repetitions, and it processes speech data more efficiently by grouping them. However, the similarity and naturalness of the imitated voice depend on factors such as the length and quality of the voice samples, their background noise, and other factors. More ![]() ![]() The 3-second sample of the original voice: VALL-E: VALL-E 2: VALL-E 2 (with a 10-second voice sample): Although commercial services like ![]() ![]() Naturalness and similarity of the simulated voice in comparison Fear of MisuseVALL-E 2 is purely a research project. Out of fear of misuse, the developers have no plans to integrate VALL-E 2 into a product or make the algorithm publicly accessible. The potential applications for a system that can perfectly imitate speakers would be diverse; besides entertainment purposes, it could be used for interactive voice dialogue systems, translations, chatbots, etc., or to help people who have difficulty speaking, such as those suffering from diseases like aphasia or ALS. However, a tool for quick and perfect voice cloning poses the risk of being misused, such as for deceiving voice authentication systems or maliciously imitating a specific voice. If VALL-E 2 is released in the future, researchers propose a procedure that ensures the speaker consents to the use of their voice and a synthetic speech recognition model. Elevenlabs, for example, provides a text captcha query that the user must read aloud within 10 seconds. ![]() deutsche Version dieser Seite: Microsoft VALL-E 2: KI ahmt jede Stimme perfekt nach - nur per 3s Stimmsample |
![]() |