Logo Logo
deutsch
directory schraeg
Knowledge
Codecs
Hardware
Camcorders· Cinema-Kamera· Computers· Drohnen· GPU· Kamera-Zubehör· Video-DSLR· accessories
Software
Compositing· Color correction· DV Editing
DV-Movies

HowTo
Shooting· Sound
Misc
Forschung· KI· Reviews· Streaming
/// News
WhisperX: Free audio transcription with speaker recognition

WhisperX: Free audio transcription with speaker recognition

[11:28 Wed,1.February 2023   by ]    

In September, OpenAI, the developers of the text AI ChatGPT and the image generation AI DALL-E 2, among others, presented the speech recognition system Whisper, which can transcribe spoken words into text. Since OpenAI fortunately published the associated programme and model for free, a large number of open source projects based on it soon developed. One of these is WhisperX, which was started by the computer scientist Max Bain and has just been published. It is of particular interest to filmmakers because it fixes some specific weaknesses of Whisper that previously prevented its use as an automatic subtitle generator.



WhisperX-Model
WhisperX Model



For one thing, WhisperX recognises different speakers (unlike the original Whisper) and makes them recognisable in the transcribed speech text. In Whisper, the timestamps can be wrong by several seconds - to prevent this, among other things, pre-filtering is used by detecting speech activity, which significantly improves the quality of the matching and prevents catastrophic timestamp errors due to whispering (such as negative timestamp duration, etc.). In WhisperX, the timestamps that indicate when a speaker starts and stops talking in the transcription are now accurate down to the sound level.




These improvements simplify the use of Whisper for the creation of subtitles, for example, or considerably, because thanks to WhispherX, much less manual editing is required. Not only is the timing now exactly right, i.e. when an actor begins to speak, the respective subtitle appears synchronously - word for word if desired - but the identification of who is saying something, which is important for subtitling for the hearing impaired, is done automatically.

Currently, standard models are provided for English, French, German, Spanish, Italian, Japanese, Dutch and Polish, among others. WhisperX uses several free tools independently to produce robust word-level segmentation with speaker labels, namely, in addition to OpenAI&s Whisper, MetaAI&s wav2vec2.0 (responsible for phoneme-level sound detection) and for voice activity detection.

WhisperX, like Whisper itself, is free of charge and freely available on Github including source code. WhisperX is written in Python and can be accessed via command line, provided you have the necessary knowledge. However, we think that WhisperX will soon be integrated into the first (online) subtitling tools or plugins in a more user-friendly way and thus offer users simple automatic subtitling.

Link more infos at bei github.com

deutsche Version dieser Seite: WhisperX: Kostenlose lautgenaue Audiotranskription mit Sprechererkennung

  

  Vorige News lesen Nächste News lesen 
bildDoP Caroline Champetier (ua. Holy Motors) bekommt die Berlinale Kamera 2023 bildSamsung Odyssey Neo G70C: 43" Mini-LED Monitor mit SmartTV-Funktionen


related news:1E0Zoom UAC-232 USB Audio Converter: 32-bit float audio eliminates the need for gain controls 26.February 2023
Blackmagic ATEM Television Studio HD8: New all-in-one live production mixer 24.February 2023
RØDE NT1 5th Generation - Studio microphone now with XLR/USB-C and 32 bit floating point 21.February 2023
New audio AI generates any sound effects in addition to music 2.February 2023
Tascam Portacapture X6 - mobile 6-track audio recorder with XLR and 32bit float introduced 30.January 2023
DJI Mic: Compact 2-channel wireless microphone system now also available in cheaper solo version 13.January 2023
Zoom introduces new MicTrak 32-bit float microphone and recorder series 19.December 2022
alle Newsmeldungen zum Thema Sound


[nach oben]
















Archiv Newsmeldungen

2023

March - February - January

2022
December - November - October - September - August - July - June - May - April - March - February - January

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: WhisperX: Kostenlose lautgenaue Audiotranskription mit Sprechererkennung



last update : 20.März 2023 - 20:00 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version