Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

SimVAE: Narrowing the gap between Discriminative & Generative Representation Learning

Alice Bizeul · Carl Allen


Self-supervised representation learning is a powerful paradigm that leverages the relationship between semantically similar data, such as augmentations, extracts of an image or sound clip, or multiple views/modalities. Recent methods, e.g. SimCLR, CLIP and DINO, have made significant strides, yielding representations that achieve state-of-the-art results on multiple downstream tasks. A number of self-supervised discriminative approaches have been proposed, e.g. instance discrimination, latent clustering and contrastive methods. Though often intuitive, a comprehensive theoretical understanding of their underlying mechanisms or what they learn eludes. Meanwhile, generative approaches, such as variational autoencoders (VAEs), fit a specific latent variable model and have principled appeal, but lag significantly in terms of performance. We present a theoretical analysis of self-supervised discriminative methods and a graphical model that reflects the assumptions they implicitly make and unifies these methods. We show that fitting this model under an ELBO objective improves representations over previous VAE methods on several common benchmarks, narrowing the gap to discriminative methods, and can also preserve information lost by discriminative approaches. This work brings new theoretical insight to modern machine learning practice.

Chat is not available.