We combine LSTM-based recurrent neural networks and Deep Q-learning for generation of musical sequences in real time. The role of LSTM is to learn the general structure of music scores (en- coded as MIDI, not audio). Deep Q-learning is used to improve and focus the generated sequences based on rewards such as desired genre, compositional correctness and ability to predict aspects of what the human collaborator is playing. This combination of RNN model-based generation with reinforcement learning is, to our knowledge, novel in the domain of music generation. This ap- proach also yields more stable, musically-relevant sequences than LSTM alone. The networks are trained for two tasks: the generation of responses to short melodic inputs, and the generation of an accompaniment to melodic input in real time, requiring continuous prediction of future output.
The addition of a novel MIDI interface on top of of TensorFlow enables improvisational experiences, allowing one to interact with the neural networks in real time.
Our main goal is to have attendees know what it’s like to collaborate creatively with a machine learning model. We’ll have professional music equipment configured such that multiple attendees can play with Magenta using MIDI keyboards. Others can listen in on a performance using multiple headphones. We’re working to make the experience fast and responsive, and to provide lots of demos. Python coders can also modify the code on-the-fly using Jupyter notebooks. We have several parallel demos in mind: 1. Accompaniment: A user walks up to a digital piano keyboard and plays a bass line to seed the system, which will continue the line when the user stops. The user can then play melodic lines and the bass line will adapt accordingly. The user can also select a certain genre or style for the system to base its accompaniment on. 2. Melody morphing: A user plays a few notes in the melody, and the system responds both with variations on this melody and a bass accompaniment. The user can also select a certain genre or style for the system to base its accompaniment on. 3. Call and response improvisation: A user plays a few bars, and the system will respond immediately with a follow-up, after which the user will again play a few bars, and so on. The user can also select a certain genre or style for the system to base its response on. 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain. 4. Real-time DJ coding demos: Users can hack Magenta music sequence models in Jupyter notebooks on laptops running the Ableton Live music sequencer. This provides a way for attendees who do not play piano keyboards to also interact with the sequence generators.
Example Links Basic Interface https://youtu.be/OUbnR4IWkF8 This video shows an example of a more basic version of the ”call and response” (UX Scenario 3) interaction we will enable. In this prototype, we see the user enter a melody and then turn a switch to initiate the system’s response. The system begins by replaying the users melody, then creating (in real-time) variations on the user’s theme. The final version will have an improved interface allowing for easier signalling to switch between phases, and also enable automatic switching based on time. In additional to this call and response interaction, we will also have real time accompaniment (UX Scenarios 1 and 2). Accompaniment Generation https://clyp.it/jdtpgjso https://clyp.it/xcacjnsf The two links above are to audio clips of accompaniments generated by our LSTM model. Both are conditioned on the same melody with the bass line generated by our model. To allow the model to be used in real time improvisation, we have trained it to generate the accom- paniment without knowing the full history of the melody. The output for the next time step t + 1, where each quarter note is represented by 4 time steps, is generated conditioned on the melody up to a quarter note before the time step (t − 4) along with the generated bass line up to the previous step (t). While both examples are conditioned on the same melody, they are primed differently for the first 4 time steps of the bass line, producing unique outputs. Acknowledgments Thanks to Natasha Jaques and Elliot Waite for help with model development and Hans Bernhard for additional work on the MIDI interface.