Google celebrates Bach’s birthday, the technology behind the music AI Doodle


On March 21st, Google published the first artificial wisdom music AI Doodle in history, celebrating the birthday of the world famous German composer and musician, Johann Sebastian Bach!

Doodle is a collaboration between Google Magenta and the Google PAIR team. It is an interactive game where players can create two melody music of their choice. After pressing the button, Doodle will use machine learning to coordinate the melody into Bach’s musical style (if you happen to find a very special Easter egg in Doodle, it might be the mixed rock style of Bach’s 1680s).

Great German musician Bach

On March 21, 1685, Bach was born in the small town of Eisenach, Germany. He grew up in a musical family: his father was able to play with a variety of instruments and served as the conductor of the town’s band. Big Brother is also a musician. When Bach was 10 years old, his father died and he grew up with his brother. Bach is also an outstanding organist. He also knows how to make and repair complex orchestral instruments.

Bach is a prolific musician who can create a big chorus every week. At the same time, Bach is also very humble, and he attributes his success to the inspiration and strict professional ethics. When he was alive, there were only a few works published, but now there are more than 1,000 manuscripts scattered around the world.

With the “Bach revival” of the 19th century, Bach’s reputation swung into the sky. At that time, the music industry began to recognize and praise his four-part harmony. Perhaps the best measure of musicians is the impact on other artists, which has been the case for centuries.

However, musicians are not the only ones affected by Bach music. After the launch of the Voyager 2 space probe, scientist and writer Lewis Thomas suggested passing Bach’s music to the outermost layer of the solar system. “I recommend using Bach’s music, all the music of Bach,” he wrote.

The story behind Doodle

Let’s take a look at the video below to learn about the birth of Doodle.

What is the first step in developing Doodle? It is to build a machine learning model. The traditional computer programming is to set a set of rules for the computer to give answers, but machine learning is to input a lot of information, let the computer learn to find out the answer. Today’s model used by Doodle is developed by Anna Huang of the Magenta team. She developed Coconet, a versatile model that can be used for a variety of musical tasks, such as coordinating melodies or composing from scratch (see more technical details on the Magenta blog).

Specifically, Coconet received 306 Bach chorus coordination training. Bach’s choir always has four voices, each with its own melody line, while playing can create rich and pleasing harmony. This compact architecture is a good training material for machine learning models.

The PAIR team uses TensorFlow.js to make machine learning entirely in a web browser, without the need to use a large number of servers like traditional machine learning. If someone’s computer or device is not fast enough to run Doodle with TensorFlow.js, Doodle can also be used with Google’s new Tensor Processing Units (TPUs) to quickly process machine learning in the data center.

These kits, combined with the engineering of the Doodle team, create the Doodles that are seen today.

How the model works

Coconet grabs incomplete scores and complements the missing parts. To train it, the team selected a section from Bach’s four-part sacred, randomly removed some notes and asked the model to reconstruct the removed notes. The difference between Bach’s composition and Coconet’s work has a learning signal through which the model can be trained.

By randomly removing notes, the team wanted a model that could handle any incomplete input. It is equivalent to training multiple models at a time, each model being adapted to different scenarios.


In the team’s view, the “music” is 3D. Bach’s chorus was created for the four parts, the soprano (S), the mezzo-soprano (A), the tenor (T) and the bass (B). Each part of the music is represented by a piano piece: a 2D array with time (discrete) as the line and pitch as the bar. We assume that each part sings only one tone at any given time. Therefore, for each part, each time point, there is a one-hot pitch vector, except that a single representation is singing the pitch vector, and all other elements are zero. In the case of uncertainty (such as model remittance), this pitch vector will contain the classification probability distribution on the pitch.

The team considered the stack of pianos as a convolutional feature map, with time and pitch forming a 2D convolution space, providing a channel for each part. Since the notes of their input model are incomplete, the remaining mask channels are provided for each part: the binary value indicates whether the pitch of the part is known at each time point. Therefore, the eight-channel feature map is entered into the model.


The model is a very simple convolutional neural network with batch normalization and residual connectivity. For Doodles that use TensorFlow.js to achieve a browser execution model, you can speed up the calculation by switching to a non-vertical separable convolution.

The team trains the model to increase the probability of assigning pitch to true notes, prompting the model to understand the musical meaning of the received incomplete score—what is the tone, what instrument is the instrument, what is the next note, and what is the previous note?

Once the model is trained, the music can be extracted from the probability distribution generated by the model. We can sample each note based on the division of each note. However, this does not explain the interaction between the sampled notes. Usually, determining which one of the notes will change the division of the other note.

One way to calculate the interaction is to sample one of the pitches, add it to the incomplete score, then pass the results through the model again, and recalculate the dispersion of the remaining pitches. This process is repeated until all the notes are determined, and the team considers all the associations to complete the score at the same time.

In fact, the method they use is more powerful: the model is remitted as a rough draft, and then rewritten and gradually refined. Specifically, all notes are sampled simultaneously, a complete (but usually meaningless) score is obtained, then partially removed and passed to the model again, and the process is repeated. As time goes on, fewer and fewer notes need to be erased and rewritten.

In the process of team modeling, only one variable is modeled at a time. At the same time, they use out-of-order modeling and use Gibbs sampling to generate scores from multiple sequences, thus ensuring that the model is valid.

Post time: Apr-03-2019