Timezone: »
We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation.Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase.Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge.Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approaches with moderate to no accuracy loss and the same parameter efficiency.
Author Information
Tao Lei (Google)
Junwen Bai (Google)
I'm a Research Scientist at Google. I received my PhD degree from the Department of Computer Science at Cornell University in 2022, advised by Prof. Carla P. Gomes. I received my Bachelor's degree in 2017 from Shanghai Jiao Tong University, where I spent four years in ACM honored Class. I am interested in the general areas of machine learning and language technology, with research focuses on sequence representation learning and probabilistic modeling, often under scenarios with low-supervision. I have developed scalable and general machine learning methods for real-world problems including automatic speech recognition, climate change and scientific discovery.
Siddhartha Brahma (IBM Research AI)
I am a research staff member at IBM Research AI. I am interested in all aspects of deep learning and its application to AI. My present work is focussed on improving the state-of-the-art in text classification, relation embedding, entity resolution and semi supervised rule learning using neural models. I am also fascinated by generative models and reinforcement learning. I have a PhD from EPFL, an MA from Princeton and a BTech from IIT Kharagpur and have worked in Google. I won the best paper award at ACM Mobihoc, 2013 and the President of India Gold Medal from IIT Kharagpur. In the past, I have worked on algorithms for wireless networks, information retrieval and theoretical computer science.
Joshua Ainslie (Google)
Kenton Lee (Google Research)
Yanqi Zhou (Google Deepmind)
Nan Du (Apple/AIML)
Vincent Zhao (Augment Computing)
Yuexin Wu (Google)
Bo Li (Google)
Yu Zhang (Google)
Ming-Wei Chang (Google DeepMind)
More from the Same Authors
-
2021 : Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification »
Junwen Bai · Shufeng Kong · Carla Gomes -
2021 : Gaussian Mixture Variational Autoencoder with Contrastive Learning for Multi-Label Classification »
Junwen Bai · Shufeng Kong · Carla Gomes -
2021 : A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction »
Joshua Fan · Junwen Bai · Zhiyun Li · Ariel Ortiz-Bobea · Carla Gomes -
2022 : Xtal2DoS: Attention-based Crystal to Sequence Learning for Density of States Prediction »
Junwen Bai · Yuanqi Du · Yingheng Wang · Shufeng Kong · John Gregoire · Carla Gomes -
2023 : Dr.ICL: Demonstration-Retrieved In-context Learning »
Man Luo · Xin Xu · Zhuyun Dai · Panupong Pasupat · Mehran Kazemi · Chitta Baral · Vaiva Imbrasaite · Vincent Zhao -
2023 Poster: DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining »
Sang Michael Xie · Hieu Pham · Xuanyi Dong · Nan Du · Hanxiao Liu · Yifeng Lu · Percy Liang · Quoc V Le · Tengyu Ma · Adams Wei Yu -
2023 Poster: Learning Large Graph Property Prediction via Graph Segment Training »
Kaidi Cao · Mangpo Phothilimtha · Sami Abu-El-Haija · Dustin Zelle · Yanqi Zhou · Charith Mendis · Jure Leskovec · Bryan Perozzi -
2023 Poster: From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces »
Peter Shaw · Mandar Joshi · James Cohan · Jonathan Berant · Panupong Pasupat · Hexiang Hu · Urvashi Khandelwal · Kenton Lee · Kristina N Toutanova -
2023 Poster: Subject-driven Text-to-Image Generation via Apprenticeship Learning »
wenhu chen · Hexiang Hu · Yandong Li · Nataniel Ruiz · Xuhui Jia · Ming-Wei Chang · William Cohen -
2023 Poster: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval »
Jinhyuk Lee · Zhuyun Dai · Sai Meher Karthik Duddu · Tao Lei · Iftekhar Naim · Ming-Wei Chang · Vincent Zhao -
2022 Poster: Mixture-of-Experts with Expert Choice Routing »
Yanqi Zhou · Tao Lei · Hanxiao Liu · Nan Du · Yanping Huang · Vincent Zhao · Andrew Dai · zhifeng Chen · Quoc V Le · James Laudon -
2021 : A GNN-RNN Approach for Harnessing Geospatial and Temporal Information: Application to Crop Yield Prediction »
Joshua Fan · Junwen Bai · Zhiyun Li · Ariel Ortiz-Bobea · Carla Gomes -
2021 Poster: Contrastively Disentangled Sequential Variational Autoencoder »
Junwen Bai · Weiran Wang · Carla Gomes -
2020 : Representation Learning for Sequence Data with Deep Autoencoding Predictive »
Junwen Bai -
2020 Poster: Transferable Graph Optimizers for ML Compilers »
Yanqi Zhou · Sudip Roy · Amirali Abdolrashidi · Daniel Wong · Peter Ma · Qiumin Xu · Hanxiao Liu · Mangpo Phothilimtha · Shen Wang · Anna Goldie · Azalia Mirhoseini · James Laudon -
2020 Oral: Transferable Graph Optimizers for ML Compilers »
Yanqi Zhou · Sudip Roy · Amirali Abdolrashidi · Daniel Wong · Peter Ma · Qiumin Xu · Hanxiao Liu · Mangpo Phothilimtha · Shen Wang · Anna Goldie · Azalia Mirhoseini · James Laudon -
2020 Poster: Big Bird: Transformers for Longer Sequences »
Manzil Zaheer · Guru Guruganesh · Kumar Avinava Dubey · Joshua Ainslie · Chris Alberti · Santiago Ontanon · Philip Pham · Anirudh Ravula · Qifan Wang · Li Yang · Amr Ahmed -
2019 : Panel Discussion »
Theodore Willke · Evelina Fedorenko · Kenton Lee · Paul Smolensky -
2017 : Panel: Machine learning and audio signal processing: State of the art and future perspectives »
Sepp Hochreiter · Bo Li · Karen Livescu · Arindam Mandal · Oriol Nieto · Malcolm Slaney · Hendrik Purwins