Machine Learning for Audio

Workshop

Machine Learning for Audio

Brian Kulis · Sadie Allen · Sander Dieleman · Shrikanth Narayanan · Rachel Manzelli · Alice Baird · Alan Cowen

Sat 16 Dec, 6:20 a.m. PST

[ Abstract ] Workshop Website

The Machine Learning for Audio Workshop at NeurIPS 2023 will bring together audio practitioners and machine learning researchers to a venue focused on various problems in audio, including music information retrieval, acoustic event detection, computational paralinguistics, speech transcription, multimodal modeling, and generative modeling of speech and other sounds. Our team has previously held multiple audio-related workshops at top machine learning venues, and both the organizing team and invited speakers represent broad diversity in terms of gender identity, affiliation, seniority, and geography. We also plan to solicit workshop papers on the topic.

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 6:30 a.m. - 6:40 a.m.	Opening remarks ( Opening remarks ) > SlidesLive Video	Brian Kulis 🔗
Sat 6:40 a.m. - 7:00 a.m.	Computer Audition Disrupted 2.0: The Foundation Models Era ( Invited talk ) > SlidesLive Video	Bjoern Schuller 🔗
Sat 7:00 a.m. - 7:20 a.m.	Explainable AI for Audio via Virtual Inspection Layers ( Oral ) > SlidesLive Video	Johanna Vielhaben · Sebastian Lapuschkin · Grégoire Montavon · Wojciech Samek 🔗
Sat 7:20 a.m. - 7:40 a.m.	Self-Supervised Speech Enhancement using Multi-Modal Data ( Oral ) > SlidesLive Video	Yu-Lin Wei · Rajalaxmi Rajagopalan · Bashima Islam · Romit Roy Choudhury 🔗
Sat 7:40 a.m. - 8:10 a.m.	A multi-view approach for audio-based speech emotion recognition ( Invited talk ) > SlidesLive Video	Dimitra Emmanouilidou 🔗
Sat 8:10 a.m. - 8:50 a.m.	Coffee break	🔗
Sat 8:50 a.m. - 9:10 a.m.	Audio Language Models ( Invited talk ) > SlidesLive Video	Neil Zeghidour 🔗
Sat 9:10 a.m. - 9:30 a.m.	Zero-shot audio captioning with audio-language model guidance and audio context keywords ( Oral ) > SlidesLive Video	Leonard Salewski · Stefan Fauth · A. Sophia Koepke · Zeynep Akata 🔗
Sat 9:30 a.m. - 10:00 a.m.	Lark: A Multimodal Foundation Model for Music ( Invited talk ) > SlidesLive Video	Rachel Bittner 🔗
Sat 10:00 a.m. - 11:30 a.m.	Lunch break	🔗
Sat 11:30 a.m. - 1:00 p.m.	Poster & Demo Session ( Poster Session ) >	🔗
Sat 1:00 p.m. - 1:30 p.m.	Coffee break	🔗
Sat 1:30 p.m. - 2:00 p.m.	Uninformative Gradients: Optimisation pathologies in differentiable digital signal processing ( Invited talk ) > SlidesLive Video	Ben Hayes 🔗
Sat 2:00 p.m. - 2:20 p.m.	EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis ( Oral ) > SlidesLive Video	Ge Zhu · Yutong Wen · Marc-André Carbonneau · Zhiyao Duan 🔗
Sat 2:20 p.m. - 2:40 p.m.	Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech ( Oral ) > SlidesLive Video	Mohamed Osman · Tamer Nadeem · Ghada khoriba 🔗
Sat 2:40 p.m. - 3:00 p.m.	Audio Personalization through Human-in-the-loop Optimization ( Oral ) > SlidesLive Video	Rajalaxmi Rajagopalan · Yu-Lin Wei · Romit Roy Choudhury 🔗
Sat 3:00 p.m. - 3:20 p.m.	Multi-channel speech enhancement for moving sources ( Invited talk ) > SlidesLive Video	Shoko Araki 🔗
-	EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis ( Poster ) >	Ge Zhu · Yutong Wen · Marc-André Carbonneau · Zhiyao Duan 🔗
-	Explainable AI for Audio via Virtual Inspection Layers ( Poster ) >	Johanna Vielhaben · Sebastian Lapuschkin · Grégoire Montavon · Wojciech Samek 🔗
-	Audio classification with Dilated Convolution with Learnable Spacings ( Poster ) > link Link	Ismail Khalfaoui Hassani · Timothée Masquelier · Thomas Pellegrini 🔗
-	Creative Text-to-Audio Generation via Synthesizer Programming ( Poster ) >	Nikhil Singh · Manuel Cherep · Jessica Shand 🔗
-	Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation ( Poster ) >	Ye Bai · Chenxing Li · Xiaorui Wang · Yuanyuan Zhao · Hao Li 🔗
-	Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion ( Poster ) >	Xueyao Zhang · Yicheng Gu · Haopeng Chen · Zihao Fang · Lexiao Zou · Liumeng Xue · Zhizheng Wu 🔗
-	Diffusion Models as Masked Audio-Video Learners ( Poster ) >	Elvis Nunez · Yanzi Jin · Mohammad Rastegari · Sachin Mehta · Maxwell Horton 🔗
-	InstrumentGen: Generating Sample-Based Musical Instruments From Text ( Poster ) > link Link	Shahan Nercessian · Johannes Imort 🔗
-	Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization ( Poster ) >	Edward Fish · Jon Weinbren · Andrew Gilbert 🔗
-	Composing and Validating Large-Scale Datasets for Training Open Foundation Models for Audio ( Poster ) >	Marianna Nezhurina · Ke Chen · Yusong Wu · Tianyu Zhang · Haohe Liu · Yuchen Hui · Taylor Berg-Kirkpatrick · Shlomo Dubnov · Jenia Jitsev 🔗
-	Unsupervised Musical Object Discovery from Audio ( Poster ) >	Joonsu Gha · Vincent Herrmann · Benjamin F. Grewe · Jürgen Schmidhuber · Anand Gopalakrishnan 🔗
-	Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data ( Poster ) > link Link	Tashi Namgyal · Alexander Hepburn · Raul Santos-Rodriguez · Valero Laparra · Jesús Malo 🔗
-	Self-Supervised Speech Enhancement using Multi-Modal Data ( Poster ) >	Yu-Lin Wei · Rajalaxmi Rajagopalan · Bashima Islam · Romit Roy Choudhury 🔗
-	Improved sound quality human-inspired DNN-based audio applications ( Poster ) >	Chuan Wen · Sarah Verhulst 🔗
-	Audio Personalization through Human-in-the-loop Optimization ( Poster ) >	Rajalaxmi Rajagopalan · Yu-Lin Wei · Romit Roy Choudhury 🔗
-	Synthia's Melody: A Benchmark Framework for Unsupervised \Domain Adaptation in Audio ( Poster ) >	Harry Coppock · Chia-Hsin Lin 🔗
-	Zero-shot audio captioning with audio-language model guidance and audio context keywords ( Poster ) >	Leonard Salewski · Stefan Fauth · A. Sophia Koepke · Zeynep Akata 🔗
-	AttentionStitch: How Attention Solves the Speech Editing Problem ( Poster ) >	Antonios Alexos · Pierre Baldi 🔗
-	MusT3: Unified Multi-Task Model for Fine-Grained Music Understanding ( Poster ) >	Martin Kukla · Minz Won · Yun-Ning Hung · Duc Le 🔗
-	Benchmarks and deep learning models for localizing rodent vocalizations in social interactions ( Poster ) >	Ralph Peterson · Aramis Tanelus · Aman Choudhri · Violet Ivan · Aaditya Prasad · David Schneider · Dan Sanes · Alex Williams 🔗
-	Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech ( Poster ) >	Mohamed Osman · Tamer Nadeem · Ghada khoriba 🔗
-	The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation ( Poster ) >	13 presenters Ilaria Manco · Benno Weck · Seungheon Doh · Yixiao Zhang · Dmitry Bogdanov · Yusong Wu · Ke Chen · Philip Tovstogan · Emmanouil Benetos · Elio Quinton · George Fazekas · Juhan Nam · Minz Won 🔗
-	ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts ( Poster ) >	Vishruth Veerendranath · Vibha Masti · Utkarsh Gupta · Hrishit Chaudhuri · Gowri Srinivasa 🔗
-	Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates ( Poster ) >	Stefan Lattner · Marco Pasini 🔗
-	Deep Generative Models of Music Expectation ( Poster ) >	Ninon Lizé Masclef · Andy Keller 🔗
-	mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks ( Poster ) > link Link	Christos Plachouras · Dmitry Bogdanov · Pablo Alonso-Jiménez 🔗