Timezone: »
Recently, the information-theoretical framework has been proven to be able to obtain non-vacuous generalization bounds for large models trained by Stochastic Gradient Langevin Dynamics (SGLD) with isotropic noise. In this paper, we optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD. We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance if both the prior and the posterior are jointly optimized. This validates that the optimal noise is quite close to the empirical gradient covariance. Technically, we develop a new information-theoretical bound that enables such an optimization analysis. We then apply matrix analysis to derive the form of optimal noise covariance. Presented constraint and results are validated by the empirical observations.
Author Information
Bohan Wang (USTC)
Huishuai Zhang (Microsoft Research Asia)
Jieyu Zhang (Department of Computer Science, University of Washington)
Qi Meng (Microsoft)
Wei Chen (Chinese Academy of Sciences)
Tie-Yan Liu (Microsoft Research Asia)
More from the Same Authors
-
2021 : WRENCH: A Comprehensive Benchmark for Weak Supervision »
Jieyu Zhang · Yue Yu · · Yujing Wang · Yaming Yang · Mao Yang · Alexander Ratner -
2021 : AI X Science »
Tie-Yan Liu -
2021 Poster: On the Generative Utility of Cyclic Conditionals »
Chang Liu · Haoyue Tang · Tao Qin · Jintao Wang · Tie-Yan Liu -
2021 Poster: Curriculum Offline Imitating Learning »
Minghuan Liu · Hanye Zhao · Zhengyu Yang · Jian Shen · Weinan Zhang · Li Zhao · Tie-Yan Liu -
2021 Poster: Speech-T: Transducer for Text to Speech and Beyond »
Jiawei Chen · Xu Tan · Yichong Leng · Jin Xu · Guihua Wen · Tao Qin · Tie-Yan Liu -
2021 Poster: Stylized Dialogue Generation with Multi-Pass Dual Learning »
Jinpeng Li · Yingce Xia · Rui Yan · Hongda Sun · Dongyan Zhao · Tie-Yan Liu -
2021 Poster: Distributional Reinforcement Learning for Multi-Dimensional Reward Functions »
Pushi Zhang · Xiaoyu Chen · Li Zhao · Wei Xiong · Tao Qin · Tie-Yan Liu -
2021 Poster: Co-evolution Transformer for Protein Contact Prediction »
He Zhang · Fusong Ju · Jianwei Zhu · Liang He · Bin Shao · Nanning Zheng · Tie-Yan Liu -
2021 : WRENCH: A Comprehensive Benchmark for Weak Supervision »
Jieyu Zhang · Yue Yu · · Yujing Wang · Yaming Yang · Mao Yang · Alexander Ratner -
2021 Poster: Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding »
Shengjie Luo · Shanda Li · Tianle Cai · Di He · Dinglan Peng · Shuxin Zheng · Guolin Ke · Liwei Wang · Tie-Yan Liu -
2021 Poster: Learning Causal Semantic Representation for Out-of-Distribution Prediction »
Chang Liu · Xinwei Sun · Jindong Wang · Haoyue Tang · Tao Li · Tao Qin · Wei Chen · Tie-Yan Liu -
2021 Poster: Object-Aware Regularization for Addressing Causal Confusion in Imitation Learning »
Jongjin Park · Younggyo Seo · Chang Liu · Li Zhao · Tao Qin · Jinwoo Shin · Tie-Yan Liu -
2021 Poster: FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition »
Yichong Leng · Xu Tan · Linchen Zhu · Jin Xu · Renqian Luo · Linquan Liu · Tao Qin · Xiangyang Li · Edward Lin · Tie-Yan Liu -
2021 Poster: Do Transformers Really Perform Badly for Graph Representation? »
Chengxuan Ying · Tianle Cai · Shengjie Luo · Shuxin Zheng · Guolin Ke · Di He · Yanming Shen · Tie-Yan Liu -
2021 Poster: R-Drop: Regularized Dropout for Neural Networks »
xiaobo liang · Lijun Wu · Juntao Li · Yue Wang · Qi Meng · Tao Qin · Wei Chen · Min Zhang · Tie-Yan Liu -
2021 Poster: Recovering Latent Causal Factor for Generalization to Distributional Shifts »
Xinwei Sun · Botong Wu · Xiangyu Zheng · Chang Liu · Wei Chen · Tao Qin · Tie-Yan Liu -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · GaĆ«l Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Neural Machine Translation with Soft Prototype »
Yiren Wang · Yingce Xia · Fei Tian · Fei Gao · Tao Qin · Cheng Xiang Zhai · Tie-Yan Liu -
2019 Poster: FastSpeech: Fast, Robust and Controllable Text to Speech »
Yi Ren · Yangjun Ruan · Xu Tan · Tao Qin · Sheng Zhao · Zhou Zhao · Tie-Yan Liu -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu -
2018 Poster: On the Local Hessian in Back-propagation »
Huishuai Zhang · Wei Chen · Tie-Yan Liu -
2017 Poster: Finite sample analysis of the GTD Policy Evaluation Algorithms in Markov Setting »
Yue Wang · Wei Chen · Yuting Liu · Zhi-Ming Ma · Tie-Yan Liu -
2017 Poster: Deliberation Networks: Sequence Generation Beyond One-Pass Decoding »
Yingce Xia · Fei Tian · Lijun Wu · Jianxin Lin · Tao Qin · Nenghai Yu · Tie-Yan Liu -
2017 Poster: LightGBM: A Highly Efficient Gradient Boosting Decision Tree »
Guolin Ke · Qi Meng · Thomas Finley · Taifeng Wang · Wei Chen · Weidong Ma · Qiwei Ye · Tie-Yan Liu -
2013 Poster: Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising »
Min Xu · Tao Qin · Tie-Yan Liu