Timezone: »

 
Trajectory ensembling for fine tuning - performance gains without modifying training
Louise Anderson-Conway · Vighnesh Birodkar · Saurabh Singh · Hossein Mobahi · Alexander Alemi
Event URL: https://openreview.net/forum?id=R0u9dYN6Q5 »

In this work, we present a simple algorithm for ensembling checkpoints from a single training trajectory (trajectory ensembling) resulting in significant gains for several fine tuning tasks. We compare against classical ensembles and perform ablation studies showing that the important checkpoints are not necessarily the best performing models in terms of accuracy. Rather, relatively poor models with low loss are vital for the observed performance gains. We also investigate various mixtures of checkpoints from several independent training trajectories, making the surprising observation that this only leads to marginal gains in this setup. We study how calibrating constituent models with a simple temperature scaling impacts results, and find that the most important region of training is still that of the lowest loss in spite of potential poor accuracy.

Author Information

Louise Anderson-Conway (Google)
Vighnesh Birodkar (Google)
Saurabh Singh (Google)
Hossein Mobahi (Google Research)
Alexander Alemi (Google)

More from the Same Authors

  • 2021 : PAC^m-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime »
    Joshua Dillon · Warren Morningstar · Alexander Alemi
  • 2021 : PAC^m-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime »
    Alexander Alemi
  • 2021 Poster: Does Knowledge Distillation Really Work? »
    Samuel Stanton · Pavel Izmailov · Polina Kirichenko · Alexander Alemi · Andrew Wilson
  • 2020 Workshop: Deep Learning through Information Geometry »
    Pratik Chaudhari · Alexander Alemi · Varun Jog · Dhagash Mehta · Frank Nielsen · Stefano Soatto · Greg Ver Steeg
  • 2020 Poster: Self-Distillation Amplifies Regularization in Hilbert Space »
    Hossein Mobahi · Mehrdad Farajtabar · Peter Bartlett
  • 2020 Session: Orals & Spotlights Track 17: Kernel Methods/Optimization »
    Chiranjib Bhattacharyya · Hossein Mobahi
  • 2019 : Contributed Session - Spotlight Talks »
    Jonathan Frankle · David Schwab · Ari Morcos · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · YiDing Jiang · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Sho Yaida · Muqiao Yang
  • 2019 : Lunch Break and Posters »
    Xingyou Song · Elad Hoffer · Wei-Cheng Chang · Jeremy Cohen · Jyoti Islam · Yaniv Blumenfeld · Andreas Madsen · Jonathan Frankle · Sebastian Goldt · Satrajit Chatterjee · Abhishek Panigrahi · Alex Renda · Brian Bartoldson · Israel Birhane · Aristide Baratin · Niladri Chatterji · Roman Novak · Jessica Forde · YiDing Jiang · Yilun Du · Linara Adilova · Michael Kamp · Berry Weinstein · Itay Hubara · Tal Ben-Nun · Torsten Hoefler · Daniel Soudry · Hsiang-Fu Yu · Kai Zhong · Yiming Yang · Inderjit Dhillon · Jaime Carbonell · Yanqing Zhang · Dar Gilboa · Johannes Brandstetter · Alexander R Johansen · Gintare Karolina Dziugaite · Raghav Somani · Ari Morcos · Freddie Kalaitzis · Hanie Sedghi · Lechao Xiao · John Zech · Muqiao Yang · Simran Kaur · Qianli Ma · Yao-Hung Hubert Tsai · Ruslan Salakhutdinov · Sho Yaida · Zachary Lipton · Daniel Roy · Michael Carbin · Florent Krzakala · Lenka Zdeborová · Guy Gur-Ari · Ethan Dyer · Dilip Krishnan · Hossein Mobahi · Samy Bengio · Behnam Neyshabur · Praneeth Netrapalli · Kris Sankaran · Julien Cornebise · Yoshua Bengio · Vincent Michalski · Samira Ebrahimi Kahou · Md Rifat Arefin · Jiri Hron · Jaehoon Lee · Jascha Sohl-Dickstein · Samuel Schoenholz · David Schwab · Dongyu Li · Sang Keun Choe · Henning Petzka · Ashish Verma · Zhichao Lin · Cristian Sminchisescu
  • 2019 : Poster Session »
    Gergely Flamich · Shashanka Ubaru · Charles Zheng · Josip Djolonga · Kristoffer Wickstrøm · Diego Granziol · Konstantinos Pitas · Jun Li · Robert Williamson · Sangwoong Yoon · Kwot Sin Lee · Julian Zilly · Linda Petrini · Ian Fischer · Zhe Dong · Alexander Alemi · Bao-Ngoc Nguyen · Rob Brekelmans · Tailin Wu · Aditya Mahajan · Alexander Li · Kirankumar Shiragur · Yair Carmon · Linara Adilova · SHIYU LIU · Bang An · Sanjeeb Dash · Oktay Gunluk · Arya Mazumdar · Mehul Motani · Julia Rosenzweig · Michael Kamp · Marton Havasi · Leighton P Barnes · Zhengqing Zhou · Yi Hao · Dylan Foster · Yuval Benjamini · Nati Srebro · Michael Tschannen · Paul Rubenstein · Sylvain Gelly · John Duchi · Aaron Sidford · Robin Ru · Stefan Zohren · Murtaza Dalal · Michael A Osborne · Stephen J Roberts · Moses Charikar · Jayakumar Subramanian · Xiaodi Fan · Max Schwarzer · Nicholas Roberts · Simon Lacoste-Julien · Vinay Prabhu · Aram Galstyan · Greg Ver Steeg · Lalitha Sankar · Yung-Kyun Noh · Gautam Dasarathy · Frank Park · Ngai-Man (Man) Cheung · Ngoc-Trung Tran · Linxiao Yang · Ben Poole · Andrea Censi · Tristan Sylvain · R Devon Hjelm · Bangjie Liu · Jose Gallego-Posada · Tyler Sypherd · Kai Yang · Jan Nikolas Morshuis
  • 2019 : Invited Talk: Alexander A Alemi »
    Alexander Alemi
  • 2018 Poster: Watch Your Step: Learning Node Embeddings via Graph Attention »
    Sami Abu-El-Haija · Bryan Perozzi · Rami Al-Rfou · Alexander Alemi
  • 2018 Poster: Large Margin Deep Networks for Classification »
    Gamaleldin Elsayed · Dilip Krishnan · Hossein Mobahi · Kevin Regan · Samy Bengio
  • 2018 Poster: GILBO: One Metric to Measure Them All »
    Alexander Alemi · Ian Fischer
  • 2018 Spotlight: GILBO: One Metric to Measure Them All »
    Alexander Alemi · Ian Fischer
  • 2017 Poster: Unsupervised Learning of Disentangled Representations from Video »
    Emily Denton · vighnesh Birodkar
  • 2017 Spotlight: Unsupervised Learning of Disentangled Representations from Video »
    Emily Denton · vighnesh Birodkar
  • 2016 Poster: DeepMath - Deep Sequence Models for Premise Selection »
    Geoffrey Irving · Christian Szegedy · Alexander Alemi · Niklas Een · Francois Chollet · Josef Urban