.footer { } Logo Logo
deutsch
/// News
Fictitious telephone numbers for the scenic film -

First pictures, then sounds: New Google AI generates arbitrary music according to text description

[10:57 Mon,30.January 2023   by Thomas Richter]    

Researchers from Google have presented a new AI that generates music (instead of images) via text prompt in a similar pattern to the currently very popular text-2-image AIs such as DALL-E 2, Midjourney or Stable Diffusion.


SD-Robot-Music
Robot Musician - imagined by Stable Diffusion


The new text-to-music AI called "MusicLM" can generate music at 24 kHz from text descriptions, which remains consistent over several minutes. MusicLM has been trained with a dataset of 280,000 hours of music to learn to create pieces of music according to complex descriptions such as "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. The music should evoke a sense of wonder and awe while being danceable".

The spectrum of music generated by MusicML is astonishing - it ranges from folk and classical music to jazz, pop, rap and reggae to techno, 8-bit computer music or death metal. As was already the case with the image and text AIs, it becomes apparent that image/text or music style is also only one parameter for an AI - as is instrumentation. Thus, any wild crossover mixes can also be generated with the music AI, such as metal music with accordions, rapping string quartets and all kinds of other combinations.



musicolors



Another interesting feature is the possibility of presenting the AI with a whistled or hummed melody, for example, which then serves as a template for producing music based on it in a certain style defined by text description.


Here is an input through a hummed "Bella Ciao":


via Music ML it becomes an electronic synth version:


or jazz with saxophone:


or a piano solo:


Text prompts for MusicML can be other instrumentations as well as abstract descriptions of specific locations, moods, musicians& skills, musical styles or combinations of these. For each description, any number of variations can be generated - in the programme, as with the image or text AIs, there are probably a number of parameters that can be used to influence the range of variation of the results. The length of the generated sound ranges from short jingles to pieces of music lasting several minutes. The resulting tracks are often surprisingly coherent and the instrumentation sounds realistic, but sometimes the melodies and tones generated are a bit weird. As always, however, with the rapid development in the field of AI, the next generation, and even more so the one after that, will be much better.

electro-swing
Electro Swing dancers - imagined by Midjourney



Rather unsuccessful attempt by MusicML of Swing:


Ideal for film music, for example, is the Story Mode, in which a dynamic soundtrack can be generated on the basis of a series of successive text descriptions and the sounds defined in this way merge seamlessly into one another. In the following piece, the corresponding prompts are "time to meditate", "time to wake up", "time to run" and "time to run" at 15-second intervals. time to run" and "time to give 100%":

Link more infos at bei google-research.github.io

deutsche Version dieser Seite: Erst Bilder, dann Sounds: Neue Google-KI generiert beliebige Musik nach Textbeschreibung

  



[nach oben]












Archiv Newsmeldungen

2025

May - April - March - February - January

2024
December - November - October - September - August - July - June - May - April - March - February - January

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000






































deutsche Version dieser Seite: Erst Bilder, dann Sounds: Neue Google-KI generiert beliebige Musik nach Textbeschreibung



last update : 12.Mai 2025 - 23:24 - slashCAM is a project by channelunit GmbH- mail : slashcam@--antispam:7465--slashcam.de - deutsche Version