Logo Logo
deutsch
directory schraeg
Knowledge
Codecs
Hardware
Camcorders· Cinema-Kamera· Computers· Drohnen· GPU· Kamera-Zubehör· Video-DSLR· accessories
Software
Compositing· Color correction· DV Editing
DV-Movies

HowTo
Shooting· Sound
Misc
Forschung· Reviews· Streaming
/// News
Whisper: New free AI turns speech into text and automatically translates into all languages

Whisper: New free AI turns speech into text and automatically translates into all languages

[15:28 Mon,26.September 2022   by ]    

OpenAI, the creators of the text AI GPT3 and the image generation AI DALL-E 2, among others, have presented the speech recognition system "Whisper", which can not only transcribe spoken words into text, but also translate them into any other language. Fortunately, OpenAI has taken a cue from Stability.ai&s approach with its Text-2-Image AI Stable Diffusion and published the associated program including model freely available and thus also published for free.


speech-2-text-grafik2


The open-source code of Whisper is available in the form of five different large versions with different accuracies and working speeds on Github, all of which run on home PCs equipped with a graphics card. Depending on the model, GPUs from 1 to 10 GB of VRAM are required. The first four models only include English, only the largest has been trained with many other languages and therefore also offers the possibility to translate spoken words from one language to another and output them as text.

Whisper-Models
Whisper models


Whisper was trained using 680,000 hours of audio material (including transcriptions) from the Internet, two-thirds of which was in English and the rest in a number of other languages. The Whisper architecture is an encoder-decoder transformer, which splits the input signal into 30-second segments, converts them into a log-mel spectrogram, and then passes them to an encoder. A decoder is trained to predict the appropriate text label, intermixed with special tokens that instruct the single model to perform tasks such as speech identification, phrase-level timestamping, multilingual speech transcription, and translation into English. The speech recognition works surprisingly well - even with unclear speech or distracting background noise.



First applications and tools use Whisper

.
The operation is quite simple via command line - but similar to Stable Diffusion, the openly accessible source code also ensures that Whisper just masses of tools are programmed, which use its capabilities for special tasks or also simply simplify the handling by a graphical user interface (GUI).

asr-summary-of-model-architecture-desktop
Whisper Architecture


To use Whisper, you don&t even have to install a program on your own PC, Whisper can also be used via web services. For example, on the AI community Huggingface there is a simple tool YouTube Whisperer that can be used to automatically transcribe the spoken words of a YouTube video into text. Another, still very simple tool allows live audio input to be converted to text via microphone. There is also a more playful Google Colab project that integrates Whisper with Stable Diffusion, allowing it to automatically generate images from English-language mp3 files.

YouTube-Whisperer
YouTube Whisperer



The future: AI tools for everyone?


For users, Whisper is another interesting and practical AI feature that can be used in the future (for free!) for all sorts of tasks. Audio transcription is thus no longer a dominion knowledge and thus only usable in special pay apps (or on OS level as in Android or via Siri). We are excited about upcoming apps that will use Whisper for new interesting functionalities in video, such as automatic indexing of home or even professional film archives for spoken words, which are then searchable by text for dialogue passages, or automatic creation of text transcripts of phone calls or other audio recordings. Of particular interest to filmmakers or video podcasters, of course, is the ability to automatically create subtitles in multiple languages and offer them depending on the origin of the target audience.

asr-summary-of-model-architecture-desktop
Whisper Architecture




Bild zur Newsmeldung:
Simon-Says1

Link more infos at bei openai.com

deutsche Version dieser Seite: Whisper: Neue kostenlose KI verwandelt Sprache in Text und übersetzt automatisch in alle Sprachen

  

  Vorige News lesen Nächste News lesen 
bildLaowa 58mm f/2.8 2X Ultra-Macro APO Objektiv vorgestellt bildDJI stellt morgen zwei neue professionelle Mavic Enterprise Modelle vor


related news:1E0Heavy Metal meets Midjourney: Music video made from 10,000 AI-generated images 4.December 2022
Meta Encodec uses AI to compress audio files significantly more than MP3 21.November 2022
AI copies movie style of "Into the Spider-Verse" in record time 14.November 2022
MAXIM - AI tool combines various image enhancements in one model 13.November 2022
Blackmagic DaVinci Resolve 18.1 Update - with AI voice isolation 11.November 2022
Photostock portal Shutterstock gives an answer to the big question and integrates AI images 5.November 2022
AvatarCLIP: New AI generates and animates 3D avatars by text description 28.October 2022
alle Newsmeldungen zum Thema Machine Learning


[nach oben]
















Archiv Newsmeldungen

2022

December - November - October - September - August - July - June - May - April - March - February - January

2021
December - November - October - September - August - July - June - May - April - March - February - January

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: Whisper: Neue kostenlose KI verwandelt Sprache in Text und übersetzt automatisch in alle Sprachen



last update : 4.Dezember 2022 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version