Timezone: »

Interactive musical improvisation with Magenta
Adam Roberts · Jesse Engel · Curtis Hawthorne · Ian Simon · Elliot Waite · Sageev Oore · Natasha Jaques · Cinjon Resnick · Douglas Eck

Wed Dec 07 09:00 AM -- 12:30 PM (PST) @ Area 5 + 6 + 7 + 8

We combine LSTM-based recurrent neural networks and Deep Q-learning for generation of musical sequences in real time. The role of LSTM is to learn the general structure of music scores (en- coded as MIDI, not audio). Deep Q-learning is used to improve and focus the generated sequences based on rewards such as desired genre, compositional correctness and ability to predict aspects of what the human collaborator is playing. This combination of RNN model-based generation with reinforcement learning is, to our knowledge, novel in the domain of music generation. This ap- proach also yields more stable, musically-relevant sequences than LSTM alone. The networks are trained for two tasks: the generation of responses to short melodic inputs, and the generation of an accompaniment to melodic input in real time, requiring continuous prediction of future output.

The addition of a novel MIDI interface on top of of TensorFlow enables improvisational experiences, allowing one to interact with the neural networks in real time.

Our main goal is to have attendees know what it’s like to collaborate creatively with a machine learning model. We’ll have professional music equipment configured such that multiple attendees can play with Magenta using MIDI keyboards. Others can listen in on a performance using multiple headphones. We’re working to make the experience fast and responsive, and to provide lots of demos. Python coders can also modify the code on-the-fly using Jupyter notebooks. We have several parallel demos in mind: 1. Accompaniment: A user walks up to a digital piano keyboard and plays a bass line to seed the system, which will continue the line when the user stops. The user can then play melodic lines and the bass line will adapt accordingly. The user can also select a certain genre or style for the system to base its accompaniment on. 2. Melody morphing: A user plays a few notes in the melody, and the system responds both with variations on this melody and a bass accompaniment. The user can also select a certain genre or style for the system to base its accompaniment on. 3. Call and response improvisation: A user plays a few bars, and the system will respond immediately with a follow-up, after which the user will again play a few bars, and so on. The user can also select a certain genre or style for the system to base its response on. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. 4. Real-time DJ coding demos: Users can hack Magenta music sequence models in Jupyter notebooks on laptops running the Ableton Live music sequencer. This provides a way for attendees who do not play piano keyboards to also interact with the sequence generators.

Example Links Basic Interface https://youtu.be/OUbnR4IWkF8 This video shows an example of a more basic version of the ”call and response” (UX Scenario 3) interaction we will enable. In this prototype, we see the user enter a melody and then turn a switch to initiate the system’s response. The system begins by replaying the users melody, then creating (in real-time) variations on the user’s theme. The final version will have an improved interface allowing for easier signalling to switch between phases, and also enable automatic switching based on time. In additional to this call and response interaction, we will also have real time accompaniment (UX Scenarios 1 and 2). Accompaniment Generation https://clyp.it/jdtpgjso https://clyp.it/xcacjnsf The two links above are to audio clips of accompaniments generated by our LSTM model. Both are conditioned on the same melody with the bass line generated by our model. To allow the model to be used in real time improvisation, we have trained it to generate the accom- paniment without knowing the full history of the melody. The output for the next time step t + 1, where each quarter note is represented by 4 time steps, is generated conditioned on the melody up to a quarter note before the time step (t − 4) along with the generated bass line up to the previous step (t). While both examples are conditioned on the same melody, they are primed differently for the first 4 time steps of the bass line, producing unique outputs. Acknowledgments Thanks to Natasha Jaques and Elliot Waite for help with model development and Hans Bernhard for additional work on the MIDI interface.

Author Information

Adam Roberts (Google Brain)
Jesse Engel (Google Brain)
Curtis Hawthorne (Google Brain)
Ian Simon (Google)
Elliot Waite (Google)
Sageev Oore (Dalhousie University, Vector Institute)
Natasha Jaques (Google Brain, UC Berkeley)
Cinjon Resnick (Google Brain)
Douglas Eck (Google Brain)

I’m a research scientist working on Magenta, an effort to generate music, video, images and text using machine intelligence. Magenta is part of the Google Brain team and is using TensorFlow (www.tensorflow.org), an open-source library for machine learning. The question Magenta asks is, “Can machines make music and art? If so, how? If not, why not?” The goal if Magenta is to produce open-source tools and models that help creative people be even more creative. I’m primarily looking at how to use so-called “generative” machine learning models to create engaging media. Additionally, I’m working on how to bring other aspects of the creative process into play. For example, art and music is not just about generating new pieces. It’s also about drawing one’s attention, being surprising, telling an interesting story, knowing what’s interesting in a scene, and so on. Before starting the Magenta project, I worked on music search and recommendation for Google Play Music. My research goal in this area was to use machine learning and audio signal processing to help listeners find the music they want when they want it. This involves both learning from audio and learning from how users consume music. In the audio domain, the main goal is to transform the ones and zeros in a digital audio file into something where musically-similar songs are also numerically similar, making it easier to do music recommendation. This is (a) user-dependent: my idea of similar is not the same as yours and (b) changes with context: my idea of similarity changes when I make a playlist for jogging versus making a playlist for a dinner party. I might choose the same song (say "Taxman" by the Beatles) but perhaps it would be the tempo for jogging that drove the selection of that specific song versus "I like the album Revolver and want to add it to the dinner party mix" for a dinner party playlist. I joined Google in 2003. Before then, I was an Associate Professor in Computer Science at University of Montreal. I helped found the BRAMS research center (Brain Music and Sound; www.brams.org) and was involved at the McGill CIRMMT center (Centre for Interdisciplinary Research in Music Media and Technology; www.cirmmt.org). Aside from audio signal processing and machine learning, I worked on music performance modeling. What exactly does a good music performer add to what is already in the score? I treated this as a machine learning question: Hypothetically, if we showed a piano-playing robot a huge collection of Chopin performances--- from the best in the world all the way down to that of a struggling teenage pianist---could it learn to play well by analyzing all of these examples? If so, what’s the right way to perform that analysis? In the end I learned a lot about the complexity and beauty of human music performance, and how performance relates to and extends composition.

More from the Same Authors