Timezone: »
We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization. Similarly, the logarithm in the loss we use for training is replaced by a low-temperature logarithm. By tuning the two temperatures, we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural networks by our bi-temperature generalization of the logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large datasets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method that uses the Tsallis divergence.
Author Information
Ehsan Amid (University of California, Santa Cruz)
Manfred K. Warmuth (Google Brain)
Rohan Anil (Google)
Tomer Koren (Google)
More from the Same Authors
-
2021 Spotlight: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2022 : Fishy: Layerwise Fisher Approximation for Higher-order Neural Network Optimization »
Abel Peirson · Ehsan Amid · Yatong Chen · Vladimir Feinberg · Manfred Warmuth · Rohan Anil -
2023 Poster: Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions »
Vladimir Feinberg · Xinyi Chen · Y. Jennifer Sun · Rohan Anil · Elad Hazan -
2023 Poster: A Computationally Efficient Sparsified Online Newton Method »
Fnu Devvrit · Sai Surya Duvvuri · Rohan Anil · Vineet Gupta · Cho-Jui Hsieh · Inderjit Dhillon -
2021 Poster: Algorithmic Instabilities of Accelerated Gradient Descent »
Amit Attia · Tomer Koren -
2021 Oral: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Efficiently Identifying Task Groupings for Multi-Task Learning »
Chris Fifty · Ehsan Amid · Zhe Zhao · Tianhe Yu · Rohan Anil · Chelsea Finn -
2021 Poster: Towards Best-of-All-Worlds Online Learning with Feedback Graphs »
Liad Erez · Tomer Koren -
2021 Poster: Never Go Full Batch (in Stochastic Convex Optimization) »
Idan Amir · Yair Carmon · Tomer Koren · Roi Livni -
2021 Poster: Optimal Rates for Random Order Online Optimization »
Uri Sherman · Tomer Koren · Yishay Mansour -
2021 Poster: Asynchronous Stochastic Optimization Robust to Arbitrary Delays »
Alon Cohen · Amit Daniely · Yoel Drori · Tomer Koren · Mariano Schain -
2020 Poster: Reparameterizing Mirror Descent as Gradient Descent »
Ehsan Amid · Manfred K. Warmuth -
2020 Poster: Stochastic Optimization with Laggard Data Pipelines »
Naman Agarwal · Rohan Anil · Tomer Koren · Kunal Talwar · Cyril Zhang -
2019 : Private Stochastic Convex Optimization: Optimal Rates in Linear Time »
Vitaly Feldman · Tomer Koren · Kunal Talwar -
2019 : Spotlight talks »
Damien Scieur · Konstantin Mishchenko · Rohan Anil -
2019 : Poster Session »
Eduard Gorbunov · Alexandre d'Aspremont · Lingxiao Wang · Liwei Wang · Boris Ginsburg · Alessio Quaglino · Camille Castera · Saurabh Adya · Diego Granziol · Rudrajit Das · Raghu Bollapragada · Fabian Pedregosa · Martin Takac · Majid Jahani · Sai Praneeth Karimireddy · Hilal Asi · Balint Daroczy · Leonard Adolphs · Aditya Rawal · Nicolas Brandt · Minhan Li · Giuseppe Ughi · Orlando Romero · Ivan Skorokhodov · Damien Scieur · Kiwook Bae · Konstantin Mishchenko · Rohan Anil · Vatsal Sharan · Aditya Balu · Chao Chen · Zhewei Yao · Tolga Ergen · Paul Grigas · Chris Junchi Li · Jimmy Ba · Stephen J Roberts · Sharan Vaswani · Armin Eftekhari · Chhavi Sharma -
2019 Workshop: Minding the Gap: Between Fairness and Ethics »
Igor Rubinov · Risi Kondor · Jack Poulson · Manfred K. Warmuth · Emanuel Moss · Alexa Hagerty -
2019 : Opening Remarks »
Jack Poulson · Manfred K. Warmuth -
2019 Poster: Memory Efficient Adaptive Optimization »
Rohan Anil · Vineet Gupta · Tomer Koren · Yoram Singer -
2018 Poster: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu -
2018 Spotlight: Leveraged volume sampling for linear regression »
Michal Derezinski · Manfred K. Warmuth · Daniel Hsu -
2017 Poster: Affine-Invariant Online Optimization and the Low-rank Experts Problem »
Tomer Koren · Roi Livni -
2017 Poster: Online Dynamic Programming »
Holakou Rahmanian · Manfred K. Warmuth -
2017 Poster: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth -
2017 Poster: Multi-Armed Bandits with Metric Movement Costs »
Tomer Koren · Roi Livni · Yishay Mansour -
2017 Spotlight: Unbiased estimates for linear regression via volume sampling »
Michal Derezinski · Manfred K. Warmuth -
2014 Poster: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth -
2014 Spotlight: The limits of squared Euclidean distance regularization »
Michal Derezinski · Manfred K. Warmuth -
2013 Workshop: Large Scale Matrix Analysis and Inference »
Reza Zadeh · Gunnar Carlsson · Michael Mahoney · Manfred K. Warmuth · Wouter M Koolen · Nati Srebro · Satyen Kale · Malik Magdon-Ismail · Ashish Goel · Matei A Zaharia · David Woodruff · Ioannis Koutis · Benjamin Recht -
2012 Poster: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2012 Spotlight: Putting Bayes to sleep »
Wouter M Koolen · Dmitri Adamskiy · Manfred K. Warmuth -
2011 Poster: Learning Eigenvectors for Free »
Wouter M Koolen · Wojciech Kotlowski · Manfred K. Warmuth -
2010 Poster: Repeated Games against Budgeted Adversaries »
Jacob D Abernethy · Manfred K. Warmuth -
2007 Spotlight: Boosting Algorithms for Maximizing the Soft Margin »
Manfred K. Warmuth · Karen Glocer · Gunnar Rätsch -
2007 Poster: Boosting Algorithms for Maximizing the Soft Margin »
Manfred K. Warmuth · Karen Glocer · Gunnar Rätsch -
2006 Poster: Randomized PCA Algorithms with Regret Bounds that are Logarithmic in the Dimension »
Manfred K. Warmuth · Dima Kuzmin