Workshop: OPT 2022: Optimization for Machine Learning

The Importance of Temperature in Multi-Task Optimization

David Mueller · Mark Dredze · Nicholas Andrews


The promise of multi-task learning is that optimizing a single model on multiplerelated tasks will lead to a better solution for all tasks than independently trained models.In practice, optimization difficulties, such as conflicting gradients, can result in negative transfer, where multi-task models which perform worse than single-task models.In this work, we identify the optimization temperature---the ratio of learning rate to batch size---asa key factor in negative transfer.Temperature controls the level of noise in each optimization step, which prior work has shown to havea strong correlation with generalization.We demonstrate that, in some multi-task settings, negative transfer may arise due to poorly set optimization temperature,rather than inherently high task conflict.The implication of this finding is that in some settings, SGD with a carefully controlledtemperature achieves comparable, and in some cases superior, performance tothat of specialized optimization procedures such as PCGrad, MGDA, and GradNorm.In particular, our results suggest that the significant additional computational burden of these specialized methods may not always be necessary.Finally, we observe a conflict between the optimal temperatures of different tasks in amulti-task objective, with different levels of noise promoting better generalization for differenttasks.Our work suggests the need for novel multi-task optimization methods which considerindividual task noise-levels, and their impact on generalization.

Chat is not available.