NeurIPS 2020

Workshop: Self-Supervised Learning for Speech and Audio Processing

Abdelrahman Mohamed, Hung-yi Lee, Shinji Watanabe, Shang-Wen Li, Tara Sainath, Karen Livescu

2020-12-11T06:50:00-08:00 - 2020-12-11T16:25:00-08:00

more: https://neurips-sas-2020.github.io/

Abstract: There is a trend in the machine learning community to adopt self-supervised approaches to pre-train deep networks. Self-supervised learning utilizes proxy supervised learning tasks, for example, distinguishing parts of the input signal from distractors, or generating masked input segments conditioned on the unmasked ones, to obtain training data from unlabeled corpora. These approaches make it possible to use a tremendous amount of unlabeled data on the web to train large networks and solve complicated tasks. ELMo, BERT, and GPT in NLP are famous examples in this direction. Recently self-supervised approaches for speech and audio processing are also gaining attention. These approaches combine methods for utilizing no or partial labels, unpaired text and audio data, contextual text and video supervision, and signals from user interactions. Although the research direction of self-supervised learning is active in speech and audio processing, current works are limited to several problems such as automatic speech recognition, speaker identification, and speech translation, partially due to the diversity of modeling in various speech and audio processing problems. There is still much unexplored territory in the research direction for self-supervised learning.

This workshop will bring concentrated discussions on self-supervision for the field of speech and audio processing via several invited talks, oral and poster sessions with high-quality papers, and a panel of leading researchers from academia and industry. Alongside research work on new self-supervised methods, data, applications, and results, this workshop will call for novel work on understanding, analyzing, and comparing different self-supervision approaches for speech and audio processing. The workshop aims to:
- Review existing and inspire new self-supervised methods and results,
- Motivate the application of self-supervision approaches to more speech and audio processing problems in academia and industry, and encourage discussion amongst experts and practitioners from the two realms,
- Encourage works on studying methods for understanding learned representations, comparing different self-supervision methods and comparing self-supervision to other self-training as well as transfer learning methods that low-resource speech and audio processing have long utilized,
- Facilitate communication within the field of speech and audio processing (e.g., people who attend conferences such as INTERSPEECH and ICASSP) as well as between the field and the whole machine learning community for sharing knowledge, ideas, and data, and encourage future collaboration to inspire innovation in the field and the whole community.

Chat

To ask questions please use rocketchat, available only upon registration and login.

Schedule

2020-12-11T06:50:00-08:00 - 2020-12-11T07:00:00-08:00

Opening remarks

Hung-yi Lee

2020-12-11T07:00:00-08:00 - 2020-12-11T07:35:00-08:00

Invited talk - 1

Bhuvana Ramabhadran

2020-12-11T07:35:00-08:00 - 2020-12-11T07:45:00-08:00

Q&A for invited talk - 1

2020-12-11T07:45:00-08:00 - 2020-12-11T08:20:00-08:00

Invited talk - Multimodal Distant Supervision

Mark Hasegawa-Johnson

2020-12-11T08:20:00-08:00 - 2020-12-11T08:30:00-08:00

Q&A for invited talk - Multimodal Distant Supervision

2020-12-11T08:30:00-08:00 - 2020-12-11T08:40:00-08:00

Self-Supervised Learning using Contrastive Mixtures for Personalized Speech Enhancement

Aswin Sivaraman

2020-12-11T08:40:00-08:00 - 2020-12-11T08:50:00-08:00

Self-supervised Pre-training Reduces Label Permutation Instability of Speech Separation

Sung-Feng Huang

2020-12-11T08:50:00-08:00 - 2020-12-11T09:00:00-08:00

Augmentation adversarial training for self-supervised speaker recognition

jaesung Huh

2020-12-11T09:00:00-08:00 - 2020-12-11T09:10:00-08:00

Neural Composition: Learning to Generate from Multiple Models

Denis Filimonov

2020-12-11T09:10:00-08:00 - 2020-12-11T09:20:00-08:00

Towards Semi-Supervised Semantics Understanding from Speech

Cheng-I Lai

2020-12-11T09:20:00-08:00 - 2020-12-11T09:30:00-08:00

The Zero Resource Speech Benchmark 2021. Metrics and baselines for unsupervised spoken language modeling

Tu Anh Nguyen

2020-12-11T09:30:00-08:00 - 2020-12-11T09:45:00-08:00

Q&A for contributed talks between 11:30 and 12:30

2020-12-11T09:45:00-08:00 - 2020-12-11T10:00:00-08:00

Break

2020-12-11T10:00:00-08:00 - 2020-12-11T10:35:00-08:00

Invited talk - Speech Processing with Weak Supervision

Dong Yu

2020-12-11T10:35:00-08:00 - 2020-12-11T10:45:00-08:00

Q&A for invited talk - Speech Processing with Weak Supervision

2020-12-11T10:45:00-08:00 - 2020-12-11T10:55:00-08:00

Towards Localisation of Keywords in Speech Using Weak Supervision

Kayode Olaleye

2020-12-11T10:55:00-08:00 - 2020-12-11T11:05:00-08:00

Text-Free Image-to-Speech Synthesis Using Learned Segmental Units

Wei-Ning Hsu

2020-12-11T11:05:00-08:00 - 2020-12-11T11:15:00-08:00

Self-Supervised Audio-Visual Separation of On-Screen Sounds from Unlabeled Videos

Efthymios Tzinis

2020-12-11T11:15:00-08:00 - 2020-12-11T11:25:00-08:00

Multi-Format Contrastive Learning of Audio Representations

Aaron van den Oord

2020-12-11T11:25:00-08:00 - 2020-12-11T11:40:00-08:00

Q&A for contributed talks between 1:45 and 2:25

2020-12-11T11:40:00-08:00 - 2020-12-11T11:55:00-08:00

Break

2020-12-11T11:55:00-08:00 - 2020-12-11T12:30:00-08:00

Invited talk - Underfitting and Uncertainty in Self-Supervised Predictive Models

Chelsea Finn

2020-12-11T12:30:00-08:00 - 2020-12-11T12:40:00-08:00

Q&A for invited talk - Underfitting and Uncertainty in Self-Supervised Predictive Models

2020-12-11T12:40:00-08:00 - 2020-12-11T13:15:00-08:00

Invited talk - Towards robust self-supervised learning of speech representations

Mirco Ravanelli

2020-12-11T13:15:00-08:00 - 2020-12-11T13:25:00-08:00

Q&A for invited talk - Towards robust self-supervised learning of speech representations

2020-12-11T13:25:00-08:00 - 2020-12-11T13:35:00-08:00

Similarity Analysis of Self-Supervised Speech Representations

Yu-An Chung

2020-12-11T13:35:00-08:00 - 2020-12-11T13:45:00-08:00

Representation Learning for Sequence Data with Deep Autoencoding Predictive

Junwen Bai

2020-12-11T13:45:00-08:00 - 2020-12-11T13:55:00-08:00

Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition

Yu Zhang

2020-12-11T13:55:00-08:00 - 2020-12-11T14:05:00-08:00

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embedding

Puyuan Peng

2020-12-11T14:05:00-08:00 - 2020-12-11T14:15:00-08:00

HUBERT: How much can a bad teacher benefit ASR pre-training?

Wei-Ning Hsu

2020-12-11T14:15:00-08:00 - 2020-12-11T14:30:00-08:00

Q&A for contributed talks between 4:25 and 5:15

2020-12-11T14:30:00-08:00 - 2020-12-11T14:45:00-08:00

Break

2020-12-11T14:45:00-08:00 - 2020-12-11T15:20:00-08:00

Invited talk - Flexible contextualized speech representation learning for diverse downstream tasks

Katrin Kirchhhoff

2020-12-11T15:20:00-08:00 - 2020-12-11T15:30:00-08:00

Q&A for invited talk - Flexible contextualized speech representation learning for diverse downstream tasks

2020-12-11T15:30:00-08:00 - 2020-12-11T16:05:00-08:00

Invited talk - De-noising Sequence-to-Sequence Pre-training

Luke Zettlemoyer

2020-12-11T16:05:00-08:00 - 2020-12-11T16:15:00-08:00

Q&A for invited talk - De-noising Sequence-to-Sequence Pre-training

2020-12-11T16:15:00-08:00 - 2020-12-11T16:25:00-08:00

Closing remark

Abdelrahman Mohamed