Timezone: »
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition. In contrast with earlier strategies that alternated between training a model and generating pseudo-labels (PLs) with it, here PLs are generated in end-to-end manner as training proceeds, improving training speed and the accuracy of the final model. PL shares a common theme with teacher-student models such as distillation in that a teacher model generates targets that need to be mimicked by the student model being trained. However, interestingly, PL strategies in general use hard-labels, whereas distillation uses the distribution over labels as the target to mimic. Inspired by distillation we expect that specifying the whole distribution (aka soft-labels) over sequences as the target for unlabeled data, instead of a single best pass pseudo-labeled transcript (hard-labels) should improve PL performance and convergence. Surprisingly and unexpectedly, we find that soft-labels targets can lead to training divergence, with the model collapsing to a degenerate token distribution per frame. We hypothesize that the reason this does not happen with hard-labels is that training loss on hard-labels imposes sequence-level consistency that keeps the model from collapsing to the degenerate solution. In this paper, we show several experiments that support this hypothesis, and experiment with several regularization approaches that can ameliorate the degenerate collapse when using soft-labels. These approaches can bring the accuracy of soft-labels closer to that of hard-labels, and while they are unable to outperform them yet, they serve as a useful framework for further improvements.
Author Information
Tatiana Likhomanenko (Apple)
Ronan Collobert (Apple)
Navdeep Jaitly (Apple)
Samy Bengio (Apple)
More from the Same Authors
-
2022 : Spotlight 4 -Tatiana Likhomanenko: Continuous Soft Pseudo-Labeling in ASR »
Tatiana Likhomanenko -
2022 Poster: Star Temporal Classification: Sequence Modeling with Partially Labeled Data »
Vineel Pratap · Awni Hannun · Gabriel Synnaeve · Ronan Collobert -
2022 Poster: Learning to Reason with Neural Networks: Generalization, Unseen Data and Boolean Measures »
Emmanuel Abbe · Samy Bengio · Elisabetta Cornacchia · Jon Kleinberg · Aryo Lotfi · Maithra Raghu · Chiyuan Zhang -
2021 Poster: Learnable Fourier Features for Multi-dimensional Spatial Positional Encoding »
Yang Li · Si Si · Gang Li · Cho-Jui Hsieh · Samy Bengio -
2021 Poster: CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings »
Tatiana Likhomanenko · Qiantong Xu · Gabriel Synnaeve · Ronan Collobert · Alex Rogozhnikov -
2021 Poster: Improving Anytime Prediction with Parallel Cascaded Networks and a Temporal-Difference Loss »
Michael Iuzzolino · Michael Mozer · Samy Bengio -
2020 Poster: Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards »
Yijie Guo · Jongwook Choi · Marcin Moczulski · Shengyu Feng · Samy Bengio · Mohammad Norouzi · Honglak Lee -
2020 : Dr. Samy Bengio (Google Brain) »
Samy Bengio -
2019 : Contributed Session - Spotlight Talks »
Jonathan Frankle · David Schwab · Ari Morcos · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · YiDing Jiang · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Sho Yaida · Muqiao Yang -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 Poster: Transfusion: Understanding Transfer Learning for Medical Imaging »
Maithra Raghu · Chiyuan Zhang · Jon Kleinberg · Samy Bengio -
2018 : Panel Discussion »
Rich Caruana · Mike Schuster · Ralf Schlüter · Hynek Hermansky · Renato De Mori · Samy Bengio · Michiel Bacchiani · Jason Eisner -
2018 Poster: Large Margin Deep Networks for Classification »
Gamaleldin Elsayed · Dilip Krishnan · Hossein Mobahi · Kevin Regan · Samy Bengio -
2018 Poster: Insights on representational similarity in neural networks with canonical correlation »
Ari Morcos · Maithra Raghu · Samy Bengio -
2018 Poster: Content preserving text generation with attribute controls »
Lajanugen Logeswaran · Honglak Lee · Samy Bengio -
2017 : Competition I: Adversarial Attacks and Defenses »
Alexey Kurakin · Ian Goodfellow · Samy Bengio · Yao Zhao · Yinpeng Dong · Tianyu Pang · Fangzhou Liao · Cihang Xie · Adithya Ganesh · Oguz Elibol -
2016 Workshop: Extreme Classification: Multi-class and Multi-label Learning in Extremely Large Label Spaces »
Moustapha Cisse · Manik Varma · Samy Bengio -
2016 Poster: Can Active Memory Replace Attention? »
Łukasz Kaiser · Samy Bengio -
2016 Poster: An Online Sequence-to-Sequence Model Using Partial Conditioning »
Navdeep Jaitly · Quoc V Le · Oriol Vinyals · Ilya Sutskever · David Sussillo · Samy Bengio -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks »
Samy Bengio · Oriol Vinyals · Navdeep Jaitly · Noam Shazeer -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: DeViSE: A Deep Visual-Semantic Embedding Model »
Andrea Frome · Greg Corrado · Jonathon Shlens · Samy Bengio · Jeff Dean · Marc'Aurelio Ranzato · Tomas Mikolov -
2012 Workshop: Big Data Meets Computer Vision: First International Workshop on Large Scale Visual Recognition and Retrieval »
Jia Deng · Samy Bengio · Yuanqing Lin · Li Fei-Fei -
2010 Poster: Label Embedding Trees for Large Multi-Class Tasks »
Samy Bengio · Jason E Weston · David Grangier -
2009 Poster: Group Sparse Coding »
Samy Bengio · Fernando Pereira · Yoram Singer · Dennis Strelow -
2009 Poster: An Online Algorithm for Large Scale Image Similarity Learning »
Gal Chechik · Uri Shalit · Varun Sharma · Samy Bengio -
2007 Workshop: Efficient Machine Learning - Overcoming Computational Bottlenecks in Machine Learning (Part 2) »
Samy Bengio · Corinna Cortes · Dennis DeCoste · Francois Fleuret · Ramesh Natarajan · Edwin Pednault · Dan Pelleg · Elad Yom-Tov -
2007 Workshop: Efficient Machine Learning - Overcoming Computational Bottlenecks in Machine Learning (Part 1) »
Samy Bengio · Corinna Cortes · Dennis DeCoste · Francois Fleuret · Ramesh Natarajan · Edwin Pednault · Dan Pelleg · Elad Yom-Tov -
2006 Workshop: Learning to Compare Examples »
David Grangier · Samy Bengio