Generative AIs can now create texts that sound as if they were written by humans, conjure up photorealistic images out of thin air, and last but not least - as has often been the subject of this article - credibly synthesize human voices. Both voices that exist in real life and artificial new ones can be generated, or rather, recordings with such voices can be created.
Unlike the robot voices that tried to sell us dubious services over the phone years ago, the algorithms that are now speaking actually sound confusingly similar to us humans. Not only is their pronunciation virtually perfect, they can even simulate emotions. This is already being used for fraudulent purposes (for example, in grandchildren's tricks or blackmail calls), but of course many other uses are obvious - namely, practically everywhere where sound recordings of speakers have been used up to now. We would guess that we already encounter AI voices more often than we think.
For example, according to a recent report at www.digitaljournal.com/life/audio-book-narrators-say-ai-is-already-taking-away-business/article, audiobook narrators are complaining about dwindling orders - in some cases, revenues are said to have halved compared to the previous year. The main culprit is said to be competition from AI-based dubbing services. There are several services on the Internet that offer to create audiobooks at a fraction of the usual price. "Spoken" by artificial voices with a trained emotional register. In some cases actual voice actors have been cloned and their creators receive royalties when their voices are used for commissions, but this is not always the case.
While traditional narrators are in danger of losing their livelihoods due to AI, the new services are being touted as democratizing the audiobook industry. Even the smallest publishers can now afford audiobook versions, the argument goes. However, only a handful of services are likely to make money from the largely automated production of audiobooks. With a labeling obligation for AI-based audiobook productions, as some speakers are calling for, the audience could still decide for themselves who reads stories to them.
The situation is likely to be similar in the dubbing business. The forerunner here seems to be Latin America, at least an article made the rounds in February, describing how speakers there are increasingly coming under pressure from automatic dubbing services that have these very speakers read in voice samples for their AI voice training - at dumping prices and without further involvement.
All these automatically generated AI voices may sound human and be able to adapt their intonation to the spoken content. Of course, that doesn't mean they can compete with real (voice) actors. A good dubbed version, for instance, was classically recorded by trained voice actors with proper direction and in some cases could be at least as good in quality as the original (sometimes even better - at least funnier, as in the legendary case of The Two).
Today's dubbed versions often seem much more loveless (especially on TV), and are usually produced much cheaper and faster. In some cases, it may not make that much of a difference if synthetic voices are used. It is also well known that AI can even adapt the lip movements in the image to the new, spoken text, so that it would soon be possible, for example, to have Ryan Gosling in Blade Runner deliver his text in German, in his "own" voice.
Not uncool at all, if this should really work out well. But the prospect of a cheap, AI-generated synchro-feeling mishmash everywhere is less pleasing.