Timezone: »
Pre-training (PT) followed by fine-tuning (FT) is an effective method for training neural networks, and has led to significant performance improvements in many domains. PT can incorporate various design choices such as task and data reweighting strategies, augmentation policies, and noise models, all of which can significantly impact the quality of representations learned. The hyperparameters introduced by these strategies therefore must be tuned appropriately. However, setting the values of these hyperparameters is challenging. Most existing methods either struggle to scale to high dimensions, are too slow and memory-intensive, or cannot be directly applied to the two-stage PT and FT learning process. In this work, we propose an efficient, gradient-based algorithm to meta-learn PT hyperparameters. We formalize the PT hyperparameter optimization problem and propose a novel method to obtain PT hyperparameter gradients by combining implicit differentiation and backpropagation through unrolled optimization. We demonstrate that our method improves predictive performance on two real-world domains. First, we optimize high-dimensional task weighting hyperparameters for multitask pre-training on protein-protein interaction graphs and improve AUROC by up to 3.9%. Second, we optimize a data augmentation neural network for self-supervised PT with SimCLR on electrocardiography data and improve AUROC by up to 1.9%.
Author Information
Aniruddh Raghu (Massachusetts Institute of Technology)
Jonathan Lorraine (University Of Toronto)
Simon Kornblith (Google Brain)
Matthew McDermott (MIT)
David Duvenaud (University of Toronto)
More from the Same Authors
-
2021 : Input Convex Gradient Networks »
Jack Richter-Powell · Jonathan Lorraine · Brandon Amos -
2021 : Input Convex Gradient Networks »
Jack Richter-Powell · Jonathan Lorraine · Brandon Amos -
2022 : Human alignment of neural network representations »
Lukas Muttenthaler · Lorenz Linhardt · Jonas Dippel · Robert Vandermeulen · Simon Kornblith -
2022 : Contrastive Pre-Training for Multimodal Medical Time Series »
Aniruddh Raghu · Payal Chandak · Ridwan Alam · John Guttag · Collin Stultz -
2022 : Contrastive Pre-Training for Multimodal Medical Time Series »
Aniruddh Raghu · Payal Chandak · Ridwan Alam · John Guttag · Collin Stultz -
2022 Poster: Patching open-vocabulary models by interpolating weights »
Gabriel Ilharco · Mitchell Wortsman · Samir Yitzhak Gadre · Shuran Song · Hannaneh Hajishirzi · Simon Kornblith · Ali Farhadi · Ludwig Schmidt -
2021 Workshop: Machine learning from ground truth: New medical imaging datasets for unsolved medical problems. »
Katy Haynes · Ziad Obermeyer · Emma Pierson · Marzyeh Ghassemi · Matthew Lungren · Sendhil Mullainathan · Matthew McDermott -
2021 Poster: Why Do Better Loss Functions Lead to Less Transferable Features? »
Simon Kornblith · Ting Chen · Honglak Lee · Mohammad Norouzi -
2021 Poster: Generalized Shape Metrics on Neural Representations »
Alex H Williams · Erin Kunz · Simon Kornblith · Scott Linderman -
2021 Poster: Do Vision Transformers See Like Convolutional Neural Networks? »
Maithra Raghu · Thomas Unterthiner · Simon Kornblith · Chiyuan Zhang · Alexey Dosovitskiy -
2020 Workshop: Machine Learning for Health (ML4H): Advancing Healthcare for All »
Stephanie Hyland · Allen Schmaltz · Charles Onu · Ehi Nosakhare · Emily Alsentzer · Irene Y Chen · Matthew McDermott · Subhrajit Roy · Benjamin Akera · Dani Kiyasseh · Fabian Falck · Griffin Adams · Ioana Bica · Oliver J Bear Don't Walk IV · Suproteem Sarkar · Stephen Pfohl · Andrew Beam · Brett Beaulieu-Jones · Danielle Belgrave · Tristan Naumann -
2020 Poster: The Origins and Prevalence of Texture Bias in Convolutional Neural Networks »
Katherine L. Hermann · Ting Chen · Simon Kornblith -
2020 Oral: The Origins and Prevalence of Texture Bias in Convolutional Neural Networks »
Katherine L. Hermann · Ting Chen · Simon Kornblith -
2020 Poster: Big Self-Supervised Models are Strong Semi-Supervised Learners »
Ting Chen · Simon Kornblith · Kevin Swersky · Mohammad Norouzi · Geoffrey E Hinton -
2019 : Coffee Break and Poster Session »
Rameswar Panda · Prasanna Sattigeri · Kush Varshney · Karthikeyan Natesan Ramamurthy · Harvineet Singh · Vishwali Mhasawade · Shalmali Joshi · Laleh Seyyed-Kalantari · Matthew McDermott · Gal Yona · James Atwood · Hansa Srinivasan · Yonatan Halpern · D. Sculley · Behrouz Babaki · Margarida Carvalho · Josie Williams · Narges Razavian · Haoran Zhang · Amy Lu · Irene Y Chen · Xiaojie Mao · Angela Zhou · Nathan Kallus -
2019 : Coffee/Poster session 2 »
Xingyou Song · Puneet Mangla · David Salinas · Zhenxun Zhuang · Leo Feng · Shell Xu Hu · Raul Puri · Wesley Maddox · Aniruddh Raghu · Prudencio Tossou · Mingzhang Yin · Ishita Dasgupta · Kangwook Lee · Ferran Alet · Zhen Xu · Jörg Franke · James Harrison · Jonathan Warrell · Guneet Dhillon · Arber Zela · Xin Qiu · Julien Niklas Siems · Russell Mendonca · Louis Schlessinger · Jeffrey Li · Georgiana Manolache · Debojyoti Dutta · Lucas Glass · Abhishek Singh · Gregor Koehler -
2019 : Poster Session »
Ethan Harris · Tom White · Oh Hyeon Choung · Takashi Shinozaki · Dipan Pal · Katherine L. Hermann · Judy Borowski · Camilo Fosco · Chaz Firestone · Vijay Veerabadran · Benjamin Lahner · Chaitanya Ryali · Fenil Doshi · Pulkit Singh · Sharon Zhou · Michel Besserve · Michael Chang · Anelise Newman · Mahesan Niranjan · Jonathon Hare · Daniela Mihai · Marios Savvides · Simon Kornblith · Christina M Funke · Aude Oliva · Virginia de Sa · Dmitry Krotov · Colin Conwell · George Alvarez · Alex Kolchinski · Shengjia Zhao · Mitchell Gordon · Michael Bernstein · Stefano Ermon · Arash Mehrjou · Bernhard Schölkopf · John Co-Reyes · Michael Janner · Jiajun Wu · Josh Tenenbaum · Sergey Levine · Yalda Mohsenzadeh · Zhenglong Zhou -
2019 Workshop: Machine Learning for Health (ML4H): What makes machine learning in medicine different? »
Andrew Beam · Tristan Naumann · Brett Beaulieu-Jones · Irene Y Chen · Madalina Fiterau · Samuel Finlayson · Emily Alsentzer · Adrian Dalca · Matthew McDermott -
2019 Poster: When does label smoothing help? »
Rafael Müller · Simon Kornblith · Geoffrey E Hinton -
2019 Spotlight: When does label smoothing help? »
Rafael Müller · Simon Kornblith · Geoffrey E Hinton -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2018 : Poster Session I »
Aniruddh Raghu · Daniel Jarrett · Kathleen Lewis · Elias Chaibub Neto · Nicholas Mastronarde · Shazia Akbar · Chun-Hung Chao · Henghui Zhu · Seth Stafford · Luna Zhang · Jen-Tang Lu · Changhee Lee · Adityanarayanan Radhakrishnan · Fabian Falck · Liyue Shen · Daniel Neil · Yusuf Roohani · Aparna Balagopalan · Brett Marinelli · Hagai Rossman · Sven Giesselbach · Jose Javier Gonzalez Ortiz · Edward De Brouwer · Byung-Hoon Kim · Rafid Mahmood · Tzu Ming Hsu · Antonio Ribeiro · Rumi Chunara · Agni Orfanoudaki · Kristen Severson · Mingjie Mai · Sonali Parbhoo · Albert Haque · Viraj Prabhu · Di Jin · Alena Harley · Geoffroy Dubourg-Felonneau · Xiaodan Hu · Maithra Raghu · Jonathan Warrell · Nelson Johansen · Wenyuan Li · Marko Järvenpää · Satya Narayan Shukla · Sarah Tan · Vincent Fortuin · Beau Norgeot · Yi-Te Hsu · Joel H Saltz · Veronica Tozzo · Andrew Miller · Guillaume Ausset · Azin Asgarian · Francesco Paolo Casale · Antoine Neuraz · Bhanu Pratap Singh Rawat · Turgay Ayer · Xinyu Li · Mehul Motani · Nathaniel Braman · Laetitia M Shao · Adrian Dalca · Hyunkwang Lee · Emma Pierson · Sandesh Ghimire · Yuji Kawai · Owen Lahav · Anna Goldenberg · Denny Wu · Pavitra Krishnaswamy · Colin Pawlowski · Arijit Ukil · Yuhui Zhang -
2018 Workshop: Machine Learning for Health (ML4H): Moving beyond supervised learning in healthcare »
Andrew Beam · Tristan Naumann · Marzyeh Ghassemi · Matthew McDermott · Madalina Fiterau · Irene Y Chen · Brett Beaulieu-Jones · Michael Hughes · Farah Shamout · Corey Chivers · Jaz Kandola · Alexandre Yahi · Samuel Finlayson · Bruno Jedynak · Peter Schulam · Natalia Antropova · Jason Fries · Adrian Dalca · Irene Chen -
2018 Poster: Representation Balancing MDPs for Off-policy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale Doshi-Velez · Emma Brunskill -
2017 : Poster spotlights »
Hiroshi Kuwajima · Masayuki Tanaka · Qingkai Liang · Matthieu Komorowski · Fanyu Que · Thalita F Drumond · Aniruddh Raghu · Leo Anthony Celi · Christina Göpfert · Andrew Ross · Sarah Tan · Rich Caruana · Yin Lou · Devinder Kumar · Graham Taylor · Forough Poursabzi-Sangdeh · Jennifer Wortman Vaughan · Hanna Wallach -
2017 : Coffee break and Poster Session I »
Nishith Khandwala · Steve Gallant · Gregory Way · Aniruddh Raghu · Li Shen · Aydan Gasimova · Alican Bozkurt · William Boag · Daniel Lopez-Martinez · Ulrich Bodenhofer · Samaneh Nasiri GhoshehBolagh · Michelle Guo · Christoph Kurz · Kirubin Pillay · Kimis Perros · George H Chen · Alexandre Yahi · Madhumita Sushil · Sanjay Purushotham · Elena Tutubalina · Tejpal Virdi · Marc-Andre Schulz · Samuel Weisenthal · Bharat Srikishan · Petar Veličković · Kartik Ahuja · Andrew Miller · Erin Craig · Disi Ji · Filip Dabek · Chloé Pou-Prom · Hejia Zhang · Janani Kalyanam · Wei-Hung Weng · Harish Bhat · Hugh Chen · Simon Kohl · Mingwu Gao · Tingting Zhu · Ming-Zher Poh · Iñigo Urteaga · Antoine Honoré · Alessandro De Palma · Maruan Al-Shedivat · Pranav Rajpurkar · Matthew McDermott · Vincent Chen · Yanan Sui · Yun-Geun Lee · Li-Fang Cheng · Chen Fang · Sibt ul Hussain · Cesare Furlanello · Zeev Waks · Hiba Chougrad · Hedvig Kjellstrom · Finale Doshi-Velez · Wolfgang Fruehwirt · Yanqing Zhang · Lily Hu · Junfang Chen · Sunho Park · Gatis Mikelsons · Jumana Dakka · Stephanie Hyland · yann chevaleyre · Hyunwoo Lee · Xavier Giro-i-Nieto · David Kale · Michael Hughes · Gabriel Erion · Rishab Mehra · William Zame · Stojan Trajanovski · Prithwish Chakraborty · Kelly Peterson · Muktabh Mayank Srivastava · Amy Jin · Heliodoro Tejeda Lemus · Priyadip Ray · Tamas Madl · Joseph Futoma · Enhao Gong · Syed Rameel Ahmad · Eric Lei · Ferdinand Legros -
2016 : Generating Class-conditional Images with Gradient-based Inference »
David Duvenaud -
2016 : David Duvenaud – No more mini-languages: The power of autodiffing full-featured Python »
David Duvenaud -
2016 Workshop: Reliable Machine Learning in the Wild »
Dylan Hadfield-Menell · Adrian Weller · David Duvenaud · Jacob Steinhardt · Percy Liang -
2016 Poster: Composing graphical models with neural networks for structured representations and fast inference »
Matthew Johnson · David Duvenaud · Alex Wiltschko · Ryan Adams · Sandeep R Datta -
2016 Poster: Probing the Compositionality of Intuitive Functions »
Eric Schulz · Josh Tenenbaum · David Duvenaud · Maarten Speekenbrink · Samuel J Gershman -
2015 : *David Duvenaud* Automatic Differentiation: The most criminally underused tool in probabilistic numerics »
David Duvenaud -
2015 Poster: Convolutional Networks on Graphs for Learning Molecular Fingerprints »
David Duvenaud · Dougal Maclaurin · Jorge Iparraguirre · Rafael Bombarell · Timothy Hirzel · Alan Aspuru-Guzik · Ryan Adams -
2014 Poster: Probabilistic ODE Solvers with Runge-Kutta Means »
Michael Schober · David Duvenaud · Philipp Hennig -
2014 Oral: Probabilistic ODE Solvers with Runge-Kutta Means »
Michael Schober · David Duvenaud · Philipp Hennig -
2012 Poster: Active Learning of Model Evidence Using Bayesian Quadrature »
Michael A Osborne · David Duvenaud · Roman Garnett · Carl Edward Rasmussen · Stephen J Roberts · Zoubin Ghahramani -
2011 Poster: Additive Gaussian Processes »
David Duvenaud · Hannes Nickisch · Carl Edward Rasmussen