We recently reported on OpenAI&s open AI transcription and translation program Whisper - now German filmmaker Octavian Mot has developed a free plugin called StoryToolkitAI (macOS/Windows) for Blackmagic&s grading, editing, compositing and audio mastering program DaVinci Resolve (Studio) 18 that uses these capabilities. StoryToolkitAI renders an audio-only file directly from Resolve&s timeline and sends it to a locally installed version of OpenAI Whisper, which then transcribes it into text. One of the advantages of this is that, unlike other transcription solutions that use online services, the audio never leaves your own PC.
Thanks to Whisper, the tool can not only transcribe voice recordings from different languages into text (for free!), but also translate this text into English. Practical additional functions have also been implemented, such as a search or navigation in the timeline using text passages from the transcription. Further functions such as a partial transcription by means of markers, more setting options or an automatic recognition of speakers (important for the transcription of dialogs) are in progress.
The quality of Whisper&s results is very high - transcription works even with poor audio quality (e.g., due to a low bitrate). According to OpenAI, Whisper models have been trained using data from 98 different languages (about 65% of the data in English) and show good results in automatic speech recognition in ~10 languages. Of particular interest to filmmakers or video podcasters is, among other things, the ability to create subtitles in multiple languages thanks to transcription and translation, and offer them depending on the origin of the target audience.
StoryToolkitAI is still in an early stage of development (the GUI is still very simple), but is already fully functional. However, the prerequisite is an installation of several components via Commandline ( here the instructions).
Speed of transcription.
A prerequisite for speedy transcription, however, is a reasonably up-to-date computer - ideally with a powerful GPU. According to rough tests, an Apple M1 Macbook Pro with 16GB RAM transcribes a 30-second timeline transcript in about 45 seconds (1.5x), and a Windows workstation with an Nvidia GTX 1070 transcribes a 60-second timeline in about 20 seconds (0.25 times the time length of the audio) - but newer RTX GPUs are significantly faster again (0.05-0.10 times the audio duration).
Main features of StoryToolkitAI.
- Free automatic transcription in many languages on a local computer directly from Resolve
- Free automatic translation from many languages into English on a local machine directly from Resolve
- Export transcripts to multiple formats, including SRT
- Import an SRT transcription file directly into Resolve
- Transcription queue, which enables the following:
- Navigation in the transcription timeline - clicking on a phrase moves the Resolve playhead to the appropriate location in the timeline
- Transcript word search: allows you to find specific words or phrases in a transcript
- Copying markers between Resolve timelines and timeline source clip
- Render Resolve markers into still images or clips
- Transcribing audio files, even if Resolve is not installed on the computer
- Mark In / Mark Out directly from the tool in Resolve
- Advanced transcriptions with more user input, such as source language and selection
- Global search to find words or phrases in project transcripts
- Transcript editing from the tool
- Trimmed transcriptions based on Resolve duration markers to transcribe only portions of the timeline
- Speaker recognition
- Integration with other AI/ML tools