.footer { } Logo Logo
deutsch
/// News
Video clip - Adobe Premiere Pro explained: Automatically remove pauses and filler words with AI, etc.

From text to audio: Stable Audio generates music and sound effects with AI

[13:08 Mon,18.September 2023   by blip]    

With Stable Diffusion, Stability AI already has a good text-to-image AI image generator at the start. Recently, Stable Audio is also available online, a new diffusion model that - as the name suggests - can create audio and music from text prompts.

stable_audio_KI_Audiogenerator


The Stable Audio model was trained with different audio inputs instead of images for this purpose. More than 800,000 - licensed - files of the audio library AudioSparks including the respective metadata were used. Through this context-rich training, the model is able to adhere to prompted specifications regarding content and form quite well, and also to time the output to the exact length. To condition the model on a connection between text and audio, a technique called Contrastive Language Audio Pretraining (CLAP) was used in the training - see this blog post for more details, which also embeds good audio examples.


stable_audio_modell
Stable Audio, latent diffusion model


Music pieces of up to 90 seconds in length can be generated, as well as individual instrument tracks or sound effects. You can specify the genre, style, mood, instrumentation, speed in BPM and more - basically everything that is usually defined in the metadata of audio libraries. In a user guide, StabilityAI has collected some examples, ranging from short and crisp to multi-linear.

The resulting pieces of music do not sound very hitworthy, not to say partly quite erratically "composed". Whereby it also depends on the kind of music and the length; quiet, ambient-like tracks can hardly be distinguished from typical, GEMA-free background music. Rather usable seem to us basically the shorter sound snippets, which can be generated as effect background, or perhaps minimalist instrument outputs.

Stable Audio is available in a free version, with which 20x tracks of up to 45 seconds can be generated per month. The Pro subscription for 12 dollars per month allows for 500 generations of up to 90 seconds in length, which may also be used in commercial projects. The download is in 44.1 kHz stereo.

An open source model of Stable Audio is also expected to be released soon, though this will have been trained with a different data set, for licensing reasons one may assume.

Link more infos at bei www.stableaudio.com

deutsche Version dieser Seite: Aus Text wird nun auch Audio: Stable Audio generiert Musik und Soundeffekte per KI

  



[nach oben]












Archiv Newsmeldungen

2024

July - June - May - April - March - February - January

2023
December - November - October - September - August - July - June - May - April - March - February - January

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: Aus Text wird nun auch Audio: Stable Audio generiert Musik und Soundeffekte per KI



last update : 26.Juli 2024 - 18:02 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version