Workshop: Distribution shifts: connecting methods and applications (DistShift)

Shift and Scale is Detrimental To Few-Shot Transfer

Moslem Yazdanpanah · Christian Desrosiers · Mohammad Havaei · Eugene Belilovsky · Samira Ebrahimi Kahou


Batch normalization is a common component in computer vision models, including ones typically used for few-shot learning. Batch normalization applied in convolutional networks consists of a normalization step, followed by the application of per-channel trainable affine parameters which shift and scale the normalized features. The use of these affine parameters can speed up model convergence on a source task. However, we demonstrate in this work that, on common few-shot learning benchmarks, training a model on a source task using these affine parameters is detrimental to downstream transfer performance. We study this effect for several methods on well-known benchmarks such as cross-domain few-shot learning (CD-FSL) benchmark and few-shot image classification on miniImageNet. We find consistent performance gains, particularly in settings with more distant transfer tasks. Improvements from applying this low-cost and easy-to-implement modifications are shown to rival gains obtained by more sophisticated and costly methods.

Chat is not available.