How can we design generative models with structure that both improve the efficiency of models and controllability for users? In this talk, I'll give two examples to illustrate how we could achieve this goal by taking inspiration from the nonlinear and hierarchical structure that underlies the human process of creating music.
Generative models of music composition typically assume music is written in a single pass from beginning to end, constraining the user to also follow this unnatural chronological process. To enable a more nonlinear creative workflow, we introduced Coconet (Huang et al., 2017) an Orderless NADE (Uria et al., 2014) like generative model (similar to masked language and visual models) that models all permutations of orderings of breaking down the task of composition. This enables both the model to learn more efficiently from data sequences by traversing it from all directions, and users to put down notes in any order and have the model complete any partial score.
Neural audio synthesizers typically synthesize musical performance audio from MIDI end-to-end, resulting in a blackbox that offers few mechanisms for control. To enable detailed user control, we introduced MIDI-DDSP (Wu et al., 2022), a hierarchical model of musical performance synthesis, that breaks down audio synthesis into a three-level hierarchy of notes, performance, and synthesis, analogous to how a creative process involves composers, performers and instruments. Not only does this interpretable hierarchy allow users to intervene at each level or utilize trained priors (performance given notes, synthesis given performance) for creative assistance, it also allows models to leverage these inductive biases to learn more efficiently from data, making it possible to train high-fidelity performance synthesis models from only a few hours of recordings.
We hope these examples might encourage researchers to partner with creative practitioners to innovate in modeling, interaction, and human-ai co-creativity. We could see the goal as not only designing generative models that can model and generate creative artifacts well, but also working towards generative agents that we can coordinate and collaborate with in a creative setting.
Cheng-Zhi Anna Huang (Google Brain)
More from the Same Authors
2021 : MIDI-DDSP: Hierarchical Modeling of Music for Detailed Control »
Yusong Wu · Ethan Manilow · Kyle Kastner · Tim Cooijmans · Aaron Courville · Cheng-Zhi Anna Huang · Jesse Engel
2022 : Datasets That Are Not: Evolving Novelty Through Sparsity and Iterated Learning »
Yusong Wu · Kyle Kastner · Tim Cooijmans · Cheng-Zhi Anna Huang · Aaron Courville
2022 : Q & A »
Cheng-Zhi Anna Huang · Negar Rostamzadeh · Mark Riedl
2022 Tutorial: Creative Culture and Machine Learning »
Negar Rostamzadeh · Cheng-Zhi Anna Huang · Mark Riedl
2022 : Tutorial part 1 »
Negar Rostamzadeh · Mark Riedl · Cheng-Zhi Anna Huang