Tech

Google AI tool uses written descriptions to create music

Published

2 years ago

February 3, 2023

A paper describing the results of a music-making artificial intelligence (AI) tool was published this week by Google researchers.

The AI music tool MusicLM is not the first to be released. Google, on the other hand, uses a limited set of descriptive words to illustrate musical creativity in the examples it provides.

AI demonstrates how human-like behavior has been taught to complex computer systems.

Tools like ChatGPT can quickly produce written documents that are comparable to human efforts. To operate intricate machine-learning models, ChatGPT and similar systems necessitate powerful computers. Late last year saw the launch of ChatGPT by OpenAI, a company based in San Francisco.

These systems, including AI voice generators, are programmed using a vast amount of data to learn and replicate various types of content. Written content, design elements, art, or music are all examples of computer-generated content.

ChatGPT has recently garnered a lot of attention for its capacity to generate intricate writings and other content from a straightforward natural language description.

MusicLM from Google

The MusicLM system is explained by Google engineers as follows:

A user’s first step is to think of a few words that best describe the kind of music they want the tool to make.

A user could, for instance, enter the following succinct phrase into the system: “a continuous calming violin backed by a soft guitar sound.” The descriptions that are entered may include various musical genres, instruments, or other sounds that already exist.

MusicLM produced a number of distinct music examples that were made available online. Some of the music that was made was based on just one or two words, like “jazz,” “rock,” or “techno.” Other examples were generated by the system from more in-depth descriptions that included entire sentences.

For example, Google researchers provide MusicLM with the following instructions: “The main soundtrack of an arcade game. It is fast-paced and upbeat, with a catchy electric guitar riff. The music is repetitive and easy to remember, but with unexpected sounds…”

In the final recording, the music seems to stay very close to what was described. According to the team, the system can attempt to produce a better description with more detail.

The machine-learning platforms used by ChatGPT are analogous to how the MusicLM model operates. Because they are trained on huge amounts of data, these tools can produce human-like results. The systems are fed a wide variety of materials to enable them to acquire complex skills for creating realistic works.

According to the team, the system can also create examples based on a person’s own singing, humming, whistling, or instrument playing, in addition to creating new music from written descriptions.

The tool “produces high-quality music…over several minutes, while being faithful to the text conditioning signal,” according to the researchers.

The MusicLM models have not yet been made available to the general public by the Google team. This is in contrast to ChatGPT, which was made accessible online in November for users to try out.

However, MusicCaps, a “high-quality dataset” composed of over 5,500 music-writing pairs prepared by professional musicians, was announced by Google. This action was taken by the researchers to aid in the creation of additional AI music generators.

According to the MusicLM researchers, they are confident that they have developed a novel instrument that will enable anyone to quickly and easily produce music selections of high quality. However, the team stated that it also recognizes some machine learning-related risks.

“biases present in the training data” was one of the most significant issues the researchers identified. Including too much of one side and not enough of the other could be considered bias. “About appropriateness for music generation for cultures underrepresented in the training data,” the researchers stated, “raises a question.”

The team stated that it intends to continue to study any system results that could be regarded as cultural appropriation. Through additional development and testing, the objective would be to reduce biases.

In addition, the researchers stated that they intend to continue improving the system to include better voice and music quality, text conditioning, and the generation of lyrics.