.footer { } Logo Logo
deutsch
/// News
Fictitious telephone numbers for the scenic film -

WhisperX: Free audio transcription with speaker recognition

[11:28 Wed,1.February 2023   by Thomas Richter]    

In September, OpenAI, the developers of the text AI ChatGPT and the image generation AI DALL-E 2, among others, presented the speech recognition system Whisper, which can transcribe spoken words into text. Since OpenAI fortunately published the associated programme and model for free, a large number of open source projects based on it soon developed. One of these is WhisperX, which was started by the computer scientist Max Bain and has just been published. It is of particular interest to filmmakers because it fixes some specific weaknesses of Whisper that previously prevented its use as an automatic subtitle generator.



WhisperX-Model
WhisperX Model



For one thing, WhisperX recognises different speakers (unlike the original Whisper) and makes them recognisable in the transcribed speech text. In Whisper, the timestamps can be wrong by several seconds - to prevent this, among other things, pre-filtering is used by detecting speech activity, which significantly improves the quality of the matching and prevents catastrophic timestamp errors due to whispering (such as negative timestamp duration, etc.). In WhisperX, the timestamps that indicate when a speaker starts and stops talking in the transcription are now accurate down to the sound level.




These improvements simplify the use of Whisper for the creation of subtitles, for example, or considerably, because thanks to WhispherX, much less manual editing is required. Not only is the timing now exactly right, i.e. when an actor begins to speak, the respective subtitle appears synchronously - word for word if desired - but the identification of who is saying something, which is important for subtitling for the hearing impaired, is done automatically.

Currently, standard models are provided for English, French, German, Spanish, Italian, Japanese, Dutch and Polish, among others. WhisperX uses several free tools independently to produce robust word-level segmentation with speaker labels, namely, in addition to OpenAI&s Whisper, MetaAI&s wav2vec2.0 (responsible for phoneme-level sound detection) and for voice activity detection.

WhisperX, like Whisper itself, is free of charge and freely available on Github including source code. WhisperX is written in Python and can be accessed via command line, provided you have the necessary knowledge. However, we think that WhisperX will soon be integrated into the first (online) subtitling tools or plugins in a more user-friendly way and thus offer users simple automatic subtitling.

Link more infos at bei github.com

deutsche Version dieser Seite: WhisperX: Kostenlose lautgenaue Audiotranskription mit Sprechererkennung

  



[nach oben]












Archiv Newsmeldungen

2024

July - June - May - April - March - February - January

2023
December - November - October - September - August - July - June - May - April - March - February - January

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: WhisperX: Kostenlose lautgenaue Audiotranskription mit Sprechererkennung



last update : 26.Juli 2024 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version