Timezone: »
Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e.,\ learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains.
Author Information
Sang Choe (Carnegie Mellon University)
Sanket Vaibhav Mehta (Carnegie Mellon University)

I'm a Ph.D. candidate at the Language Technologies Institute (LTI) at the School of Computer Science, Carnegie Mellon University, and I'm advised by Emma Strubell. I'm interested in machine learning, natural language processing, and optimization with a specific focus on learning from limited labeled data, multiple tasks, and non-stationary data distributions (Lifelong Learning, Transfer Learning, Meta-Learning, Multi-Task Learning).
Hwijeen Ahn (Carnegie Mellon University)
Willie Neiswanger (Stanford / USC)
Pengtao Xie (UC San Diego)
Emma Strubell (Carnegie Mellon University)
Eric Xing (Petuum Inc.)
More from the Same Authors
-
2021 : Personalized Benchmarking with the Ludwig Benchmarking Toolkit »
Avanika Narayan · Piero Molino · Karan Goel · Willie Neiswanger · Christopher Ré -
2021 : Synthetic Benchmarks for Scientific Research in Explainable Machine Learning »
Yang Liu · Sujay Khandagale · Colin White · Willie Neiswanger -
2021 : Geometric Question Answering Towards Multimodal Numerical Reasoning »
Jiaqi Chen · Jianheng Tang · Jinghui Qin · Xiaodan Liang · Lingbo Liu · Eric Xing · Liang Lin -
2022 : The Impact of Symbolic Representations on In-context Learning for Few-shot Reasoning »
Hanlin Zhang · yifan zhang · Li Erran Li · Eric Xing -
2022 : Betty: An Automatic Differentiation Library for Multilevel Optimization »
Sang Choe · Willie Neiswanger · Pengtao Xie · Eric Xing -
2023 : Contextualized Networks Reveal Heterogeneous Transcriptomic Regulation in Tumors at Sample-Specific Resolution »
Caleb Ellington · Ben Lengerich · Thomas Watkins · Jiekun Yang · Hanxi Xiao · Manolis Kellis · Eric Xing -
2023 : A Study on the Calibration of In-context Learning »
Hanlin Zhang · yifan zhang · Yaodong Yu · Eric Xing · Himabindu Lakkaraju · Sham Kakade -
2023 : On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation »
Duy M. H. Nguyen · Tan Ngoc Pham · Nghiem Diep · Nghi Phan · Quang Pham · Vinh Tong · Binh Nguyen · Ngan Le · Nhat Ho · Pengtao Xie · Daniel Sonntag · Mathias Niepert -
2023 : Correlated Trajectory Uncertainty for Adaptive Sequential Decision Making »
Ian Char · Youngseog Chung · Rohan Shah · Willie Neiswanger · Jeff Schneider -
2023 : Fusing Models with Complementary Expertise »
Hongyi Wang · Felipe Maia Polo · Yuekai Sun · Souvik Kundu · Eric Xing · Mikhail Yurochkin -
2023 : LightSeq: : Sequence Level Parallelism for Distributed Training of Long Context Transformers »
Dacheng Li · Rulin Shao · Anze Xie · Eric Xing · Joseph Gonzalez · Ion Stoica · Xuezhe Ma · Hao Zhang -
2023 Workshop: 4th Workshop on Self-Supervised Learning: Theory and Practice »
Tengda Han · Ishan Misra · Pengtao Xie · Mathilde Caron · Hilde Kuehne -
2023 Workshop: Machine Learning with New Compute Paradigms »
Jannes Gladrow · Benjamin Scellier · Eric Xing · Babak Rahmani · Francesca Parmigiani · Paul Prucnal · Cheng Zhang -
2023 Workshop: Adaptive Experimental Design and Active Learning in the Real World »
Willie Neiswanger · Mojmir Mutny · Ilija Bogunovic · Ava Amini · Zi Wang · Stefano Ermon · Andreas Krause -
2023 Poster: LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching »
Duy M. H. Nguyen · Hoang Nguyen · Nghiem Diep · Tan Ngoc Pham · Tri Cao · Binh Nguyen · Paul Swoboda · Nhat Ho · Shadi Albarqouni · Pengtao Xie · Daniel Sonntag · Mathias Niepert -
2023 Poster: FedNAR: Federated Optimization with Normalized Annealing Regularization »
Junbo Li · Ang Li · Chong Tian · Qirong Ho · Eric Xing · Hongyi Wang -
2023 Poster: An Empirical Investigation of the Role of Pre-training in Lifelong Learning »
Sanket Vaibhav Mehta · Darshan Patil · Sarath Chandar · Emma Strubell -
2023 Poster: Counterfactual Generation with Identifiability Guarantees »
Hanqi Yan · Lingjing Kong · Lin Gui · Yuejie Chi · Eric Xing · Yulan He · Kun Zhang -
2023 Poster: Temporally Disentangled Representation Learning under Unknown Nonstationarity »
Xiangchen Song · Weiran Yao · Yewen Fan · Xinshuai Dong · Guangyi Chen · Juan Carlos Niebles · Eric Xing · Kun Zhang -
2023 Poster: Identification of Nonlinear Latent Hierarchical Models »
Lingjing Kong · Biwei Huang · Feng Xie · Eric Xing · Yuejie Chi · Kun Zhang -
2023 Poster: Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer »
Bowen Tan · Yun Zhu · Lijuan Liu · Eric Xing · Zhiting Hu · Jindong Chen -
2023 Poster: Importance-aware Co-teaching for Offline Model-based Optimization »
Ye Yuan · Can Chen · Zixuan Liu · Willie Neiswanger · Xue (Steve) Liu -
2023 Poster: Weakly Supervised 3D Open-vocabulary Segmentation »
Kunhao Liu · Fangneng Zhan · Jiahui Zhang · MUYU XU · Yingchen Yu · Abdulmotaleb El Saddik · Christian Theobalt · Eric Xing · Shijian Lu -
2023 Poster: Squeeze, Recover and Relabel: Dataset Condensation at ImageNet Scale From A New Perspective »
Zeyuan Yin · Eric Xing · Zhiqiang Shen -
2023 Poster: Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena »
Lianmin Zheng · Wei-Lin Chiang · Ying Sheng · Siyuan Zhuang · Zhanghao Wu · Yonghao Zhuang · Zi Lin · Zhuohan Li · Dacheng Li · Eric Xing · Hao Zhang · Joseph Gonzalez · Ion Stoica -
2022 Workshop: Tackling Climate Change with Machine Learning »
Peetak Mitra · Maria João Sousa · Mark Roth · Jan Drgona · Emma Strubell · Yoshua Bengio -
2022 Spotlight: Masked Generative Adversarial Networks are Data-Efficient Generation Learners »
Jiaxing Huang · Kaiwen Cui · Dayan Guan · Aoran Xiao · Fangneng Zhan · Shijian Lu · Shengcai Liao · Eric Xing -
2022 Workshop: Self-Supervised Learning: Theory and Practice »
Ishan Misra · Pengtao Xie · Gul Varol · Yale Song · Yuki Asano · Xiaolong Wang · Pauline Luc -
2022 : Invited Talk: Willie Neiswanger »
Willie Neiswanger -
2022 Poster: AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness »
Dacheng Li · Hongyi Wang · Eric Xing · Hao Zhang -
2022 Poster: Generalizing Bayesian Optimization with Decision-theoretic Entropies »
Willie Neiswanger · Lantao Yu · Shengjia Zhao · Chenlin Meng · Stefano Ermon -
2022 Poster: Rare Gems: Finding Lottery Tickets at Initialization »
Kartik Sreenivasan · Jy-yong Sohn · Liu Yang · Matthew Grinde · Alliot Nagle · Hongyi Wang · Eric Xing · Kangwook Lee · Dimitris Papailiopoulos -
2022 Poster: Saliency-Aware Neural Architecture Search »
Ramtin Hosseini · Pengtao Xie -
2022 Poster: Exploration via Planning for Information about the Optimal Trajectory »
Viraj Mehta · Ian Char · Joseph Abbate · Rory Conlin · Mark Boyer · Stefano Ermon · Jeff Schneider · Willie Neiswanger -
2022 Poster: Masked Generative Adversarial Networks are Data-Efficient Generation Learners »
Jiaxing Huang · Kaiwen Cui · Dayan Guan · Aoran Xiao · Fangneng Zhan · Shijian Lu · Shengcai Liao · Eric Xing -
2021 Workshop: 2nd Workshop on Self-Supervised Learning: Theory and Practice »
Pengtao Xie · Ishan Misra · Pulkit Agrawal · Abdelrahman Mohamed · Shentong Mo · Youwei Liang · Jeannette Bohg · Kristina N Toutanova -
2021 Poster: Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification »
Youngseog Chung · Willie Neiswanger · Ian Char · Jeff Schneider -
2019 : Lunch Break and Posters »
Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu -
2019 : Morning Coffee Break & Poster Session »
Eric Metodiev · Keming Zhang · Markus Stoye · Randy Churchill · Soumalya Sarkar · Miles Cranmer · Johann Brehmer · Danilo Jimenez Rezende · Peter Harrington · AkshatKumar Nigam · Nils Thuerey · Lukasz Maziarka · Alvaro Sanchez Gonzalez · Atakan Okan · James Ritchie · N. Benjamin Erichson · Harvey Cheng · Peihong Jiang · Seong Ho Pahng · Samson Koelle · Sami Khairy · Adrian Pol · Rushil Anirudh · Jannis Born · Benjamin Sanchez-Lengeling · Brian Timar · Rhys Goodall · Tamás Kriváchy · Lu Lu · Thomas Adler · Nathaniel Trask · Noëlie Cherrier · Tomohiko Konno · Muhammad Kasim · Tobias Golling · Zaccary Alperstein · Andrei Ustyuzhanin · James Stokes · Anna Golubeva · Ian Char · Ksenia Korovina · Youngwoo Cho · Chanchal Chatterjee · Tom Westerhout · Gorka Muñoz-Gil · Juan Zamudio-Fernandez · Jennifer Wei · Brian Lee · Johannes Kofler · Bruce Power · Nikita Kazeev · Andrey Ustyuzhanin · Artem Maevskiy · Pascal Friederich · Arash Tavakoli · Willie Neiswanger · Bohdan Kulchytskyy · sindhu hari · Paul Leu · Paul Atzberger -
2019 Poster: Offline Contextual Bayesian Optimization »
Ian Char · Youngseog Chung · Willie Neiswanger · Kirthevasan Kandasamy · Oak Nelson · Mark Boyer · Egemen Kolemen · Jeff Schneider -
2019 Poster: Specific and Shared Causal Relation Modeling and Mechanism-Based Clustering »
Biwei Huang · Kun Zhang · Pengtao Xie · Mingming Gong · Eric Xing · Clark Glymour -
2018 Poster: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing -
2018 Spotlight: Neural Architecture Search with Bayesian Optimisation and Optimal Transport »
Kirthevasan Kandasamy · Willie Neiswanger · Jeff Schneider · Barnabas Poczos · Eric Xing