5th Workshop on Self-Supervised Learning: Theory and Practice

Workshop

5th Workshop on Self-Supervised Learning: Theory and Practice

XuDong Wang · Ishan Misra · Mathilde Caron · Tengda Han · Pengtao Xie

Sat 14 Dec, 8:30 a.m. PST

[ Abstract ] Workshop Website

[ OpenReview]

At NeurIPS from 2020 to 2024, we successfully organized the 1st, 2nd, 3rd and 4t workshops on Self-Supervised Learning – Theory and Practice. These events attracted a diverse audience from multiple domains, including vision, speech, NLP, robotics, ML theory, and industry practitioners. Building on the success of these previous workshops, we are excited to continue organizing the workshop on self-supervised learning this year. Self-supervised learning (SSL) is an approach of representation learning that does not rely on human-labeled data. Instead, it creates auxiliary tasks from unlabeled input data and learns representations by solving these tasks. SSL has shown significant success across various domains such as images (e.g., MAE, DINO, MoCo, PIRL, SimCLR), speech (e.g., wav2vec, Whisper), and text (e.g., BERT, GPT, Llama). It has also demonstrated promising results in other data modalities including graphs, time-series, and audio. Recent large language models—predominantly trained on web-scale data using self-supervised methods—have exhibited remarkable generalizability and are beginning to transform numerous research fields. SSL, without using human-provided labels, can achieve performance comparable to or even surpassing that of fully supervised methods. Furthermore, generative SSL techniques such as Imagen, Stable Diffusion, and SORA have significantly enhanced the artistic capabilities of AI models. Existing research on self-supervised learning (SSL) has primarily concentrated on enhancing empirical performance without substantial theoretical underpinnings. Although SSL approaches are empirically effective across various benchmarks, their theoretical foundations and practical applications remain less explored. Key questions such as the reasons behind the superior performance of certain auxiliary tasks, the requisite amount of unlabeled data for learning effective representations, the impact of neural architectures on SSL performance, and the practical scenarios where SSL outperforms supervised models, are still largely unanswered. Our workshop aims to address these gaps by fostering a dialogue between theory and practice, especially in the context of LLMs. We plan to gather researchers interested in SSL from diverse fields to explore the theoretical bases of empirically successful SSL methods and to discuss how these theoretical insights could further enhance SSL’s practical performance. This workshop will differentiate itself from previous SSL-related workshops by prioritizing the establishment of theoretical foundations and providing theoretical frameworks to guide the development of new SSL methods. Additionally, we will attempt to close the loop from practice to theory, by inviting practitioners to share their experiences and insights regarding the practical advantages and challenges of using SSL

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Sat 8:30 a.m. - 8:55 a.m.	Poster Setup ( Poster ) >	🔗
Sat 9:00 a.m. - 9:15 a.m.	Opening Remarks ( Intro ) > SlidesLive Video	XuDong Wang 🔗
Sat 9:15 a.m. - 9:45 a.m.	Sherry Yang (Google DeepMind): Self-Supervised World Modeling from Internet Data ( Invited Talk ) > link SlidesLive Video Link	Sherry Yang 🔗
Sat 9:45 a.m. - 10:15 a.m.	Pauline Luc (Google DeepMind): Self-supervision for General Video Understanding Beyond Semantics ( Invited Talk ) > link SlidesLive Video Link	Pauline Luc 🔗
Sat 10:15 a.m. - 10:30 a.m.	Short Break ( Short Break ) >	🔗
Sat 10:30 a.m. - 11:00 a.m.	Hanna Hajishirzi (University of Washington & AI2): OLMo & Molmo: Open Textual and Visual Language Models ( Invited Talk ) > link SlidesLive Video Link	Hanna Hajishirzi 🔗
Sat 11:00 a.m. - 11:30 a.m.	Hilde Kuehne (Univ. of Tuebingen & MIT-IBM Watson AI Lab): Advances in Self-supervised Multimodal Learning ( Invited Talk ) > link SlidesLive Video Link	Hilde Kuehne 🔗
Sat 11:30 a.m. - 11:45 a.m.	Oral: In-Context Symmetries: Self-Supervised Learning through Contextual World Models ( Oral ) > link SlidesLive Video Link	Sharut Gupta · Chenyu Wang · Yifei Wang · Tommi Jaakkola · Stefanie Jegelka 🔗
Sat 11:45 a.m. - 12:00 p.m.	Oral: A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning ( Oral ) > link SlidesLive Video Link	Khimya Khetarpal · Daniel (Zhaohan) Guo · Bernardo Avila Pires · Yunhao Tang · Clare Lyle · Mark Rowland · Nicolas Heess · Diana Borsa · Arthur Guez · Will Dabney 🔗
Sat 12:30 p.m. - 1:50 p.m.	Poster Session ( Poster ) >	🔗
Sat 2:00 p.m. - 2:30 p.m.	Trevor Darrell (UC Berkeley): From Unsupervised Segmentation to Visual Prompting ( Invited Talk ) > link SlidesLive Video Link	Trevor Darrell 🔗
Sat 2:30 p.m. - 3:00 p.m.	Alan Yuille (Johns Hopkins University): Supervision of 3D-aware Models by Synthetic Data ( Invited Talk ) > link SlidesLive Video Link	Alan Yuille 🔗
Sat 3:00 p.m. - 3:30 p.m.	Phillip Isola (MIT): Representation Learning from Human Feedback ( Invited Talk ) > link SlidesLive Video Link	Phillip Isola 🔗
Sat 3:30 p.m. - 4:00 p.m.	Lili Yu (FAIR, Meta): Paths Towards Deep Fused Multimodal Modeling ( Invited Talk ) > link SlidesLive Video Link	🔗
Sat 4:00 p.m. - 4:30 p.m.	Ziwei Liu (Nanyang Technological University): From High-fidelity 3D Generative Models to Dynamic Embodied Learning ( Invited Talk ) > link SlidesLive Video Link	Ziwei Liu 🔗
Sat 4:30 p.m. - 4:45 p.m.	Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? ( Oral ) > link SlidesLive Video Link	Mark Ibrahim · David Klindt · Randall Balestriero 🔗
Sat 4:45 p.m. - 5:00 p.m.	Neural Embedding Ranks: Aligning 3D latent dynamics with movement for long-term decoding ( Oral ) > link SlidesLive Video Link	Chenggang Chen · Zhiyu Yang 🔗
-	Self-Supervised Pretext Tasks for Event Sequence Data from Detecting Misalignment ( Poster ) > link Link	Yimu Wang · He Zhao · Ruizhi Deng · Fred Tung · Greg Mori 🔗
-	Masked Self-Supervised Pretraining for Semantic Segmentation of Dental Radiographs ( Poster ) > link Link	Tejeswar Pokuri · Laalenthika Konthalapalli · Sarvesh Kumar · Karthik S. 🔗
-	For Perception Tasks: The Cost of LLM Pretraining by Next-Token Prediction Outweigh its Benefits ( Poster ) > link Link	Randall Balestriero · Hai Huang 🔗
-	Variational Graph Contrastive Learning ( Poster ) > link Link	shifeng xie · Jhony H. Giraldo 🔗
-	PabLO: Improving Semi-Supervised Learning with Pseudolabeling Optimization ( Poster ) > link Link	Harit Vishwakarma · Yi Chen · Satya Sai Srinath Namburi · Sui Jiet Tay · Ramya Korlakai Vinayak · Frederic Sala 🔗
-	Two Is Better Than One: Aligned Clusters Improve Anomaly Detection ( Poster ) > link Link	Alain Ryser · Thomas Sutter · Alexander Marx · Julia Vogt 🔗
-	A Unifying Framework for Action-Conditional Self-Predictive Reinforcement Learning ( Poster ) > link Link	Khimya Khetarpal · Daniel (Zhaohan) Guo · Bernardo Avila Pires · Yunhao Tang · Clare Lyle · Mark Rowland · Nicolas Heess · Diana Borsa · Arthur Guez · Will Dabney 🔗
-	Squeezing Water from a Stone: Improving Pre-Trained Self-Supervised Embeddings Through Effective Entropy Maximization ( Poster ) > link Link	Deep Chakraborty · Tim G. J. Rudner · Erik Learned-Miller 🔗
-	Boosting Unsupervised Segmentation Learning ( Poster ) > link Link	Alp Eren SARI · Francesco Locatello · Paolo Favaro 🔗
-	Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations? ( Poster ) > link Link	Mark Ibrahim · David Klindt · Randall Balestriero 🔗
-	Benchmarking Self-Supervised Learning for Single-Cell Data ( Poster ) > link Link	Philip Toma · Olga Ovcharenko · Imant Daunhawer · Julia Vogt · Florian Barkmann · Valentina Boeva 🔗
-	Equivariant Representation Learning for Augmentation-based Self-Supervised Learning via Image Reconstruction ( Poster ) > link Link	Qin Wang · Kai Krajsek · Hanno Scharr 🔗
-	Squeezing performance from pathology foundation models with chained hyperparameter searches ( Poster ) > link Link	Joseph Cappadona · Ken Zeng · Carlos Fernandez-Granda · Jan Witowski · Yann LeCun · Krzysztof Geras 🔗
-	Leveraging Audio and Visual Recurrence for Unsupervised Video Highlight Detection ( Poster ) > link Link	Md Zahidul Islam · Sujoy Paul · Mrigank Rochan 🔗
-	Test-Time Adaptation for Video Highlight Detection ( Poster ) > link Link	Md Zahidul Islam · Sujoy Paul · Mrigank Rochan 🔗
-	Decoupling Vertical Federated Learning using Local Self-Supervision ( Poster ) > link Link	Avi Amalanshu · Yash Sirvi · David Inouye 🔗
-	Self-supervised Video Instance Segmentation Can Boost Geographic Entity Alignment in Historical Maps ( Poster ) > link Link	Xue Xia · Randall Balestriero · Tao Zhang · Lorenz Hurni 🔗
-	Neural Embeddings Rank: Aligning 3D latent dynamics with movements ( Poster ) > link Link	Chenggang Chen · Zhiyu Yang 🔗
-	An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions ( Poster ) > link Link	Theo Clark · Benedetta Cevoli · Eloy de Jong · Timofey Abramski · Jamie Dougherty 🔗
-	Self-Supervised Bisimulation Action Chunk Representation for Efficient RL ( Poster ) > link Link	Lei Shi · Jianye Hao · Hongyao Tang · Zibin Dong · YAN ZHENG 🔗
-	In-Context Symmetries: Self-Supervised Learning through Contextual World Models ( Poster ) > link Link	Sharut Gupta · Chenyu Wang · Yifei Wang · Tommi Jaakkola · Stefanie Jegelka 🔗
-	On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning ( Poster ) > link Link	Bokun Wang · Yunwen Lei · Yiming Ying · Tianbao Yang 🔗
-	Anomaly Detection In The Wild: Can SSL Handle Strong Distribution Imbalances? ( Poster ) > link Link	Daniel Otero · Rafael Mateus · Randall Balestriero 🔗
-	EmbedSimScore: Advancing Protein Similarity Analysis with Structural and Contextual Embeddings ( Poster ) > link Link	Gourab Saha · Toki Tahmid · Md. Shamsuzzoha Bayzid 🔗
-	Improving OOD Generalization of Pre-trained Encoders via Aligned Embedding-Space Ensembles ( Poster ) > link Link	Shuman Peng · Arash Khoeini · Sharan Vaswani · Martin Ester 🔗
-	MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations ( Poster ) > link Link	Benedikt Alkin · Lukas Miklautz · Sepp Hochreiter · Johannes Brandstetter 🔗
-	Adaptive Neighborhoods in Contrastive Regression Learning for Brain Age Prediction ( Poster ) > link Link	Jakob Träuble · Lucy Hiscox · Curtis Johnson · Carola-Bibiane Schönlieb · Gabriele Schierle · Angelica Aviles-Rivero 🔗
-	LLM2CLIP: Extending the Capability Boundaries of CLIP through Large Language Models ( Poster ) > link Link	11 presenters Aoqi Wu · weiquan Huang · Yifan Yang · Xufang Luo · Yuqing Yang · Chunyu Wang · Liang Hu · Xiyang Dai · Dongdong Chen · Chong Luo · Lili Qiu 🔗
-	Uncovering RL Integration in SSL Loss: Objective-Specific Implications for Data-Efficient RL ( Poster ) > link Link	Ömer Çağatan · Baris Akgun 🔗
-	Explainable Audio-Visual Representation Learning via Prototypical Contrastive Masked Autoencoder ( Poster ) > link Link	Yi Li · Plamen P Angelov 🔗
-	DIETing: Self-Supervised Learning with Instance Discrimination Learns Identifiable Features ( Poster ) > link Link	Attila Juhos · Alice Bizeul · Patrik Reizinger · Randall Balestriero · David Klindt · Mark Ibrahim · Julia Vogt · Wieland Brendel 🔗
-	Intra-video Positive Pairs in Self-Supervised Learning for Ultrasound ( Poster ) > link Link	Blake VanBerlo · Alexander Wong · Jesse Hoey · Robert Arntfield 🔗
-	Robust Self-Supervised Learning for Adversarial Attack Detection ( Poster ) > link Link	Yi Li · Plamen P Angelov · Neeraj Suri 🔗
-	Data Augmentation Transformations for Self-Supervised Learning with Ultrasound ( Poster ) > link Link	Blake VanBerlo · Alexander Wong · Jesse Hoey · Robert Arntfield 🔗
-	Unfolding Videos Dynamics via Taylor Expansion ( Poster ) > link Link	Siyi Chen · Minkyu Choi · Zesen Zhao · Kuan Han · Qing Qu · Zhongming Liu 🔗
-	A Graph Matching Approach to Balanced Data Sub-Sampling for Self-Supervised Learning ( Poster ) > link Link	Hugues Van Assel · Randall Balestriero 🔗
-	Time-dependent Sampling for Contrastive Self-supervised Learning of Longitudinal Biosignals Representations ( Poster ) > link Link	Sam Perochon · Salar Abbaspourazad · Joseph Futoma · Andy Miller · Guillermo Sapiro 🔗
-	Maven: A Multimodal Foundation Model for Supernova Science ( Poster ) > link Link	Gemma Zhang · Thomas Helfer · Alex Gagliano · Siddharth Mishra-Sharma · V Villar 🔗
-	Hoop-MSSL: Multi-Task Self-supervised Representation Learning on Basketball Spatio-Temporal Data ( Poster ) > link Link	xing wang · Jianchong Shao · Chunyang Huang · Zitian Tang · Miguel-Ángel GÓMEZ · Zhang shaoliang · Konstantinos Pelechrinis 🔗
-	The Birth of Self Supervised Learning: A Supervised Theory ( Poster ) > link Link	Randall Balestriero · Yann LeCun 🔗
-	Unsupervised Event Outlier Detection in Continuous Time ( Poster ) > link Link	Somjit Nath · Kry Yik Chau Lui · Siqi Liu 🔗
-	Context-Aware Predictive Coding: A Representation Learning Framework for WiFi Sensing ( Poster ) > link Link	Borna Barahimi · Hina Tabassum · Mohammad Omer · Omer Waqar 🔗
-	Seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models ( Poster ) > link Link	Hafez Ghaemi · Eilif B. Muller · Shahab Bakhtiari 🔗
-	Representing Positional Information in Generative World Models for Object Manipulation ( Poster ) > link Link	Stefano Ferraro · Pietro Mazzaglia · Tim Verbelen · Bart Dhoedt · Sai Rajeswar Mudumba 🔗
-	Influence Estimation in Self-Supervised Learning ( Poster ) > link Link	Nidhin Harilal · Reza Akbarian Bafghi · Amit Rege · Maziar Raissi · Claire Monteleoni 🔗
-	Enhancing JEPAs with Spatial Conditioning: Robust and Efficient Representation Learning ( Poster ) > link Link	Etai Littwin · Vimal Thilak · Anand Gopalakrishnan 🔗
-	TSA on AutoPilot: Self-tuning Self-supervised Time Series Anomaly Detection ( Poster ) > link Link	Boje Deforce · Meng-Chieh Lee · Bart Baesens · Estefanía Asensio · Jaemin Yoo · Leman Akoglu 🔗
-	On the Collapse Errors Induced by the Deterministic Sampler for Diffusion Models ( Poster ) > link Link	Zhang · Difan Zou 🔗
-	Self-Supervised Learning of Disentangled Representations for Multivariate Time-Series ( Poster ) > link Link	Ching Chang · Chan Chiao-Tung · Wei-Yao Wang · Wen-Chih Peng · Tien-Fu Chen 🔗
-	Pearls from Pebbles: Improved Confidence Functions for Auto-labeling ( Poster ) > link Link	Harit Vishwakarma · Yi Chen · Sui Jiet Tay · Satya Sai Srinath Namburi · Frederic Sala · Ramya Korlakai Vinayak 🔗
-	Informed Augmentation Selection Improves Tabular Contrastive Learning ( Poster ) > link Link	Arash Khoeini · Shuman Peng · Martin Ester 🔗
-	Self Supervised Learning Using Controlled Diffusion Image Augmentation ( Poster ) > link Link	Judah Goldfeder · Patrick Puma · Gabriel Guo · Gabriel Trigo · Hod Lipson 🔗
-	Uncovering the Risk of Model Collapsing in Self-Supervised Continual Test-time Adaptation ( Poster ) > link Link	Trung Hieu Hoang · MinhDuc Vo · Minh Do 🔗
-	PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling ( Poster ) > link Link	Junmyeong Lee · Euijun Hwang · Sukmin Cho · Jong Park 🔗
-	DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks ( Poster ) > link Link	Wei Cui · Yi Sui · Jesse Cresswell · Keyvan Golestan 🔗
-	When Do We Not Need Larger Vision Models? ( Poster ) > link Link	Baifeng Shi · Ziyang Wu · Maolin Mao · Xin Wang · Trevor Darrell 🔗
-	SigCLR: Sigmoid Contrastive Learning of Visual Representations ( Poster ) > link Link	Ömer Çağatan 🔗
-	NARAIM: Native Aspect Ratio Autoregressive Image Models ( Poster ) > link Link	Daniel Gallo Fernández · Robert van der Klis · Răzvan-Andrei Matișan · Janusz Partyka · Samuele Papa · Efstratios Gavves · Phillip Lippe 🔗
-	$X$ -Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs ( Poster ) > link Link	Uladzislau Sobal · Mark Ibrahim · Randall Balestriero · Vivien Cabannes · Diane Bouchacourt · Pietro Astolfi · Kyunghyun Cho · Yann LeCun 🔗