[15:28 Mon,26.September 2022 by Thomas Richter] |
OpenAI, the creators of the text AI GPT3 and the image generation AI
![]() The open-source code of Whisper is available in the form of five different large versions with different accuracies and working speeds on ![]() ![]() Whisper models Whisper was trained using 680,000 hours of audio material (including transcriptions) from the Internet, two-thirds of which was in English and the rest in a number of other languages. The Whisper architecture is an encoder-decoder transformer, which splits the input signal into 30-second segments, converts them into a log-mel spectrogram, and then passes them to an encoder. A decoder is trained to predict the appropriate text label, intermixed with special tokens that instruct the single model to perform tasks such as speech identification, phrase-level timestamping, multilingual speech transcription, and translation into English. The speech recognition works surprisingly well - even with unclear speech or distracting background noise. First applications and tools use Whisper.The operation is quite simple via command line - but similar to ![]() ![]() Whisper Architecture To use Whisper, you don&t even have to install a program on your own PC, Whisper can also be used via web services. For example, on the AI community Huggingface there is a simple tool ![]() ![]() ![]() ![]() YouTube Whisperer The future: AI tools for everyone?For users, Whisper is another interesting and practical AI feature that can be used in the future (for free!) for all sorts of tasks. Audio transcription is thus no longer a dominion knowledge and thus only usable in special pay apps (or on OS level as in Android or via Siri). We are excited about upcoming apps that will use Whisper for new interesting functionalities in video, such as automatic indexing of home or even professional film archives for spoken words, which are then searchable by text for dialogue passages, or automatic creation of text transcripts of phone calls or other audio recordings. Of particular interest to filmmakers or video podcasters, of course, is the ability to automatically create subtitles in multiple languages and offer them depending on the origin of the target audience. ![]() Whisper Architecture Bild zur Newsmeldung:
![]() deutsche Version dieser Seite: Whisper: Neue kostenlose KI verwandelt Sprache in Text und übersetzt automatisch in alle Sprachen |
![]() |