Timezone: »
We present the Supermasks in Superposition (SupSup) model, capable of sequentially learning thousands of tasks without catastrophic forgetting. Our approach uses a randomly initialized, fixed base network and for each task finds a subnetwork (supermask) that achieves good performance. If task identity is given at test time, the correct subnetwork can be retrieved with minimal memory usage. If not provided, SupSup can infer the task using gradient-based optimization to find a linear superposition of learned supermasks which minimizes the output entropy. In practice we find that a single gradient step is often sufficient to identify the correct mask, even among 2500 tasks. We also showcase two promising extensions. First, SupSup models can be trained entirely without task identity information, as they may detect when they are uncertain about new data and allocate an additional supermask for the new training distribution. Finally the entire, growing set of supermasks can be stored in a constant-sized reservoir by implicitly storing them as attractors in a fixed-sized Hopfield network.
Author Information
Mitchell Wortsman (University of Washington, Allen Institute for Artificial Intelligence)
Vivek Ramanujan (University of Washington)
Rosanne Liu (ML Collective)
Aniruddha Kembhavi (Allen Institute for Artificial Intelligence (AI2))
Mohammad Rastegari (University of Washington)
Jason Yosinski (ML Collective)
Dr. Jason Yosinski is a machine learning researcher, was a founding member of Uber AI Labs, and is scientific adviser to Recursion Pharmaceuticals and several other companies. His work focuses on building more capable and more understandable AI. As scientists and engineers build increasingly powerful AI systems, the abilities of these systems increase faster than does our understanding of them, motivating much of his work on AI Neuroscience: an emerging field of study that investigates fundamental properties and behaviors of AI systems. Dr. Yosinski completed his PhD as a NASA Space Technology Research Fellow working at the Cornell Creative Machines Lab, the University of Montreal, Caltech/NASA Jet Propulsion Laboratory, and Google DeepMind. His work on AI has been featured on NPR, Fast Company, the Economist, TEDx, XKCD, and on the BBC. Prior to his academic career, Jason cofounded two web technology companies and started a program in the Los Angeles school district that teaches students algebra via hands-on robotics. In his free time, Jason enjoys cooking, sailing, motorcycling, reading, paragliding, and sometimes pretending he's an artist.
Ali Farhadi (University of Washington)
More from the Same Authors
-
2021 : Robust fine-tuning of zero-shot models »
Mitchell Wortsman · Gabriel Ilharco · Jong Wook Kim · Mike Li · Hanna Hajishirzi · Ali Farhadi · Hongseok Namkoong · Ludwig Schmidt -
2023 Poster: OBJECT 3DIT: Language-guided 3D-aware Image Editing »
Oscar Michel · Anand Bhattad · Ranjay Krishna · Tanmay Gupta · Aniruddha Kembhavi -
2023 Poster: Stable and low-precision training for large-scale vision-language models »
Mitchell Wortsman · Tim Dettmers · Luke Zettlemoyer · Ari Morcos · Ali Farhadi · Ludwig Schmidt -
2023 Poster: Neural Priming for Sample-Efficient Adaptation »
Matthew Wallingford · Vivek Ramanujan · Alex Fang · Aditya Kusupati · Roozbeh Mottaghi · Aniruddha Kembhavi · Ludwig Schmidt · Ali Farhadi -
2023 Poster: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2023 Poster: Objaverse-XL: A Colossal Universe of 3D Objects »
Matt Deitke · Ruoshi Liu · Matthew Wallingford · Huong Ngo · Oscar Michel · Aditya Kusupati · Alan Fan · Christian Laforte · Vikram Voleti · Samir Yitzhak Gadre · Eli VanderBilt · Aniruddha Kembhavi · Carl Vondrick · Georgia Gkioxari · Kiana Ehsani · Ludwig Schmidt · Ali Farhadi -
2023 Poster: SugarCrepe: Fixing Hackable Benchmarks for Vision-Language Compositionality »
Cheng-Yu Hsieh · Jieyu Zhang · Zixian Ma · Aniruddha Kembhavi · Ranjay Krishna -
2023 Oral: DataComp: In search of the next generation of multimodal datasets »
Samir Yitzhak Gadre · Gabriel Ilharco · Alex Fang · Jonathan Hayase · Georgios Smyrnis · Thao Nguyen · Ryan Marten · Mitchell Wortsman · Dhruba Ghosh · Jieyu Zhang · Eyal Orgad · Rahim Entezari · Giannis Daras · Sarah Pratt · Vivek Ramanujan · Yonatan Bitton · Kalyani Marathe · Stephen Mussmann · Richard Vencu · Mehdi Cherti · Ranjay Krishna · Pang Wei Koh · Olga Saukh · Alexander Ratner · Shuran Song · Hannaneh Hajishirzi · Ali Farhadi · Romain Beaumont · Sewoong Oh · Alex Dimakis · Jenia Jitsev · Yair Carmon · Vaishaal Shankar · Ludwig Schmidt -
2022 : Matryoshka Representations for Adaptive Deployment »
Aniket Rege · Aditya Kusupati · Gantavya Bhatt · Matthew Wallingford · Aditya Sinha · Vivek Ramanujan · William Howard-Snyder · Kaifeng Chen · Sham Kakade · Prateek Jain · Ali Farhadi -
2022 Workshop: Broadening Research Collaborations »
Sara Hooker · Rosanne Liu · Pablo Samuel Castro · FatemehSadat Mireshghallah · Sunipa Dev · Benjamin Rosman · João Madeira Araújo · Savannah Thais · Sara Hooker · Sunny Sanyal · Tejumade Afonja · Swapneel Mehta · Tyler Zhu -
2022 Poster: Patching open-vocabulary models by interpolating weights »
Gabriel Ilharco · Mitchell Wortsman · Samir Yitzhak Gadre · Shuran Song · Hannaneh Hajishirzi · Simon Kornblith · Ali Farhadi · Ludwig Schmidt -
2022 Poster: LAION-5B: An open large-scale dataset for training next generation image-text models »
Christoph Schuhmann · Romain Beaumont · Richard Vencu · Cade Gordon · Ross Wightman · Mehdi Cherti · Theo Coombes · Aarush Katta · Clayton Mullis · Mitchell Wortsman · Patrick Schramowski · Srivatsa Kundurthy · Katherine Crowson · Ludwig Schmidt · Robert Kaczmarczyk · Jenia Jitsev -
2022 Poster: 🏘️ ProcTHOR: Large-Scale Embodied AI Using Procedural Generation »
Matt Deitke · Eli VanderBilt · Alvaro Herrasti · Luca Weihs · Kiana Ehsani · Jordi Salvador · Winson Han · Eric Kolve · Aniruddha Kembhavi · Roozbeh Mottaghi -
2022 Poster: Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP »
Thao Nguyen · Gabriel Ilharco · Mitchell Wortsman · Sewoong Oh · Ludwig Schmidt -
2022 Poster: Ask4Help: Learning to Leverage an Expert for Embodied Tasks »
Kunal Pratap Singh · Luca Weihs · Alvaro Herrasti · Jonghyun Choi · Aniruddha Kembhavi · Roozbeh Mottaghi -
2021 Poster: Bridging the Imitation Gap by Adaptive Insubordination »
Luca Weihs · Unnat Jain · Iou-Jen Liu · Jordi Salvador · Svetlana Lazebnik · Aniruddha Kembhavi · Alex Schwing -
2021 Poster: Container: Context Aggregation Networks »
peng gao · Jiasen Lu · Hongsheng Li · Roozbeh Mottaghi · Aniruddha Kembhavi -
2020 Poster: Learning About Objects by Learning to Interact with Them »
Martin Lohmann · Jordi Salvador · Aniruddha Kembhavi · Roozbeh Mottaghi -
2020 Social: Open collaboration in ML research »
Rosanne Liu -
2019 : Panel - The Role of Communication at Large: Aparna Lakshmiratan, Jason Yosinski, Been Kim, Surya Ganguli, Finale Doshi-Velez »
Aparna Lakshmiratan · Finale Doshi-Velez · Surya Ganguli · Zachary Lipton · Michela Paganini · Anima Anandkumar · Jason Yosinski -
2019 Poster: Hamiltonian Neural Networks »
Sam Greydanus · Misko Dzamba · Jason Yosinski -
2019 Poster: Discovering Neural Wirings »
Mitchell Wortsman · Ali Farhadi · Mohammad Rastegari -
2019 Poster: LCA: Loss Change Allocation for Neural Network Training »
Janice Lan · Rosanne Liu · Hattie Zhou · Jason Yosinski -
2019 Poster: Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask »
Hattie Zhou · Janice Lan · Rosanne Liu · Jason Yosinski -
2018 : Jason Yosinski, "Good and bad assumptions in model design and interpretability" »
Jason Yosinski -
2018 Poster: Faster Neural Networks Straight from JPEG »
Lionel Gueguen · Alex Sergeev · Ben Kadlec · Rosanne Liu · Jason Yosinski -
2018 Poster: An intriguing failing of convolutional neural networks and the CoordConv solution »
Rosanne Liu · Joel Lehman · Piero Molino · Felipe Petroski Such · Eric Frank · Alex Sergeev · Jason Yosinski -
2017 Symposium: Interpretable Machine Learning »
Andrew Wilson · Jason Yosinski · Patrice Simard · Rich Caruana · William Herlands -
2017 Poster: SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability »
Maithra Raghu · Justin Gilmer · Jason Yosinski · Jascha Sohl-Dickstein -
2016 : Ali Farhadi: Deep Learning on Resource Constraint Devices »
Ali Farhadi -
2016 : Ali Farhadi »
Ali Farhadi -
2016 Demonstration: Adventures with Deep Generator Networks »
Jason Yosinski · Anh Nguyen · Jeff Clune · Douglas K Bemis -
2016 Poster: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks »
Anh Nguyen · Alexey Dosovitskiy · Jason Yosinski · Thomas Brox · Jeff Clune -
2015 Poster: Visalogy: Answering Visual Analogy Questions »
Fereshteh Sadeghi · C. Lawrence Zitnick · Ali Farhadi -
2014 Poster: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson -
2014 Demonstration: Playing with Convnets »
Jason Yosinski · Hod Lipson -
2014 Oral: How transferable are features in deep neural networks? »
Jason Yosinski · Jeff Clune · Yoshua Bengio · Hod Lipson