Researchers from American tech giant Google have created an AI that can generate minutes-long musical pieces from text prompts, and can even transform a whistled or hummed melody into other instruments, similar to how systems like DALL-E generate images from written prompts, reported The Verge, an American technology news website, via TechCrunch.
According to the outlet, the model is called MusicLM, and while you can’t play around with it for yourself, the company has uploaded a bunch of samples that it produced using the model.
The examples are impressive. There are 30-second snippets of what sound like actual songs created from paragraph-long descriptions that prescribe a genre, vibe, and even specific instruments, as well as five-minute-long pieces generated from one or two words like “melodic techno.”
Also featured on the demo site are examples of what the model produces when asked to generate 10-second clips of instruments like the cello or maracas, eight-second clips of a certain genre, music that would fit a prison escape, and even what a beginner piano player would sound like versus an advanced one.
It also includes interpretations of phrases like “futuristic club” and “accordion death metal,” reported The Verge.
MusicLM can even simulate human vocals, and while it seems to get the tone and overall sound of voices right, there’s a quality to them that’s definitely off.
As per The Verge, AI-generated music has a long history dating back decades; there are systems that have been credited with composing pop songs, copying Bach better than a human could in the 90s, and accompanying live performances