Timezone: »

Musical Speech: A Transformer-based Composition Tool
Jason d'Eon · Sri Harsha Dumpala · Chandramouli Shama Sastry · Daniel Oore · Mengyu Yang · Sageev Oore

Tue Dec 08 08:20 PM -- 08:40 PM & Wed Dec 09 08:20 PM -- 08:40 PM (PST) @
Event URL: https://jasondeon.github.io/musicalSpeech/ »

In this demo we propose a compositional tool that generates musical sequences based on prosody of speech recorded by the user. The tool allows any user–-regardless of musical training--to use their own speech to generate musical melodies, while hearing the direct connection between their recorded speech and resulting music. This is achieved with a pipeline combining speech-based signal processing [1,2], musical heuristics, and a set of transformer models [3,4] trained for new musical tasks. Importantly, the pipeline is designed to work with any kind of speech input and does not require a paired dataset for the training of the said transformer model.

Our approach consists of the following steps:

  1. Estimate the F0 values and loudness envelope of the speech signal.
  2. Convert this into a sequence of musical constraints derived from the speech signal.
  3. Apply one or more transformer models—each trained on different musical tasks or datasets—to this constraint sequence to produce musical sequences that follow or accompany the speech patterns in a variety of ways.

The demo is self-explanatory: the audience can interact with the system by either providing a live-recording using a web-based recording interface or by uploading a pre-recorded speech sample. The system then provides a visualization of the formant contours extracted from the provided speech sample, the set of note constraints obtained from the speech, and the sequence of musical notes as generated by the transformers. The audience can also listen to—and interactively mix the levels (volume) of—the input speech sample, initial note sequences, and the musical sequences as generated by the transformer models.


[1] Rabiner & Huang. Fundamentals of speech recognition. [2] Dumpala et al. Sine-wave speech as pre-processing for downstream tasks. Symp. FRSM 2020 [3] Vaswani et al. Attention is all you need. NeurIPS 2017 [4] Huang et al, Music Transformer ICLR 2018

Author Information

Jason d'Eon (Dalhousie University)
Sri Harsha Dumpala (Dalhousie University and Vector Institute)

Currently a PhD student affiliated with the Computer Science Dept. of Dalhousie University and Vector AI Institute. Areas of Interest: Emotional Speech Synthesis, Generative Dialogue Systems, Multimodal Sentiment and Emotion Analysis, Deep Learning, Speech and Natural Language Processing

Chandramouli Shama Sastry (Vector Institute/Dalhousie University)
Daniel Oore (Memorial University of Newfoundland)
Mengyu Yang (University of Toronto)
Sageev Oore (Dalhousie University, Vector Institute)

More from the Same Authors

  • 2020 : Mask-Guided Discovery of Semantic Manifolds in Generative Models »
    Mengyu Yang
  • 2020 : A Speech-Based Music Composition Tool with Transformer »
    Jason d'Eon
  • 2022 : Sequence Modeling Motion-Captured Dance »
    Emily Napier · Gavia Gray · Sageev Oore
  • 2022 : Predicting Individual Depression Symptoms from Acoustic Features During Speech »
    Sebastian Rodriguez · Sri Harsha Dumpala · Katerina Dikaios · Sheri Rempel · Rudolf Uher · Sageev Oore
  • 2022 Poster: Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators »
    Scott Lowe · Robert Earle · Jason d'Eon · Thomas Trappenberg · Sageev Oore
  • 2021 Poster: TriBERT: Human-centric Audio-visual Representation Learning »
    Tanzila Rahman · Mengyu Yang · Leonid Sigal
  • 2019 : Poster Session »
    Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu
  • 2016 Demonstration: Interactive musical improvisation with Magenta »
    Adam Roberts · Jesse Engel · Curtis Hawthorne · Ian Simon · Elliot Waite · Sageev Oore · Natasha Jaques · Cinjon Resnick · Douglas Eck