Title: Adapt like you train: How optimization at training time affects model finetuning and adaptation
Abstract: With the growing use of large-scale machine learning models pretrained on massive datasets, it is becoming increasingly important to understand how we can efficiently adapt these models to downstream tasks at test time. In this talk, I will discuss our recent work that highlights an important but often overlooked factor in this process: specifically, we have found in several cases that the loss function used to train the model has important implications as to the best way to finetune or adapt the model. I will highlight two specific examples of this phenomenon: 1) illustrating that using contrastive loss outperforms alternatives for fine-tuning contrastively-pretrained vision-language models; and 2) showing how we can leverage the convex conjugate of the training loss to perform label-free test time adaptation. I will end by highlighting open questions and directions for this work.