Timezone: »
Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning
Zafir Stojanovski · Karsten Roth · Zeynep Akata
Event URL: https://openreview.net/forum?id=XetJ4I78tf »
Large pretrained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts.In addition, subsequent finetuning can considerably improve performance on a selected downstream task. However, through naive finetuning, these zero-shot models lose their generalizability and robustness towards distribution shifts.This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially.In this work, we showcase that where finetuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings.In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
Large pretrained, zero-shot capable models have shown considerable success both for standard transfer and adaptation tasks, with particular robustness towards distribution shifts.In addition, subsequent finetuning can considerably improve performance on a selected downstream task. However, through naive finetuning, these zero-shot models lose their generalizability and robustness towards distribution shifts.This is a particular problem for tasks such as Continual Learning (CL), where continuous adaptation has to be performed as new task distributions are introduced sequentially.In this work, we showcase that where finetuning falls short to adapt such zero-shot capable models, simple momentum-based weight interpolation can provide consistent improvements for CL tasks in both memory-free and memory-based settings.In particular, we find improvements of over $+4\%$ on standard CL benchmarks, while reducing the error to the upper limit of jointly training on all tasks at once in parts by more than half, allowing the continual learner to inch closer to the joint training limits.
Author Information
Zafir Stojanovski (Eberhard-Karls-Universität Tübingen)
Karsten Roth (University of Tuebingen)
Zeynep Akata (University of Tübingen)
More from the Same Authors
-
2021 : Improving the Fairness of Deep Chest X-ray Classifiers »
Haoran Zhang · Natalie Dullerud · Karsten Roth · Stephen Pfohl · Marzyeh Ghassemi -
2022 : PlanT: Explainable Planning Transformers via Object-Level Representations »
Katrin Renz · Kashyap Chitta · Otniel-Bogdan Mercea · A. Sophia Koepke · Zeynep Akata · Andreas Geiger -
2022 : Momentum-based Weight Interpolation of Strong Zero-Shot Models for Continual Learning »
Zafir Stojanovski · Karsten Roth · Zeynep Akata -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: Relational Proxies: Emergent Relationships as Fine-Grained Discriminators »
ABHRA CHAUDHURI · Massimiliano Mancini · Zeynep Akata · Anjan Dutta -
2022 Poster: Relational Proxies: Emergent Relationships as Fine-Grained Discriminators »
ABHRA CHAUDHURI · Massimiliano Mancini · Zeynep Akata · Anjan Dutta -
2021 Workshop: ImageNet: Past, Present, and Future »
Zeynep Akata · Lucas Beyer · Sanghyuk Chun · A. Sophia Koepke · Diane Larlus · Seong Joon Oh · Rafael Rezende · Sangdoo Yun · Xiaohua Zhai -
2021 Poster: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning »
Timo Milbich · Karsten Roth · Samarth Sinha · Ludwig Schmidt · Marzyeh Ghassemi · Bjorn Ommer -
2020 Poster: Attribute Prototype Network for Zero-Shot Learning »
Wenjia Xu · Yongqin Xian · Jiuniu Wang · Bernt Schiele · Zeynep Akata -
2019 : Coffee Break + Poster Session I »
Wei-Hung Weng · Simon Kohl · Aiham Taleb · Arijit Patra · Khashayar Namdar · Matthias Perkonigg · Shizhan Gong · Abdullah-Al-Zubaer Imran · Amir Abdi · Ilja Manakov · Johannes C. Paetzold · Ben Glocker · Dushyant Sahoo · Shreyas Fadnavis · Karsten Roth · Xueqing Liu · Yifan Zhang · Alexander Preuhs · Fabian Eitel · Anusua Trivedi · Tomer Weiss · Darko Stern · Liset Vazquez Romaguera · Johannes Hofmanninger · Aakash Kaku · Oloruntobiloba Olatunji · Anastasia Razdaibiedina · Tao Zhang