Timezone: »
As hyper-parameters are ubiquitous and can significantly affect the model performance, hyper-parameter optimization is extremely important in machine learning. In this paper, we consider a sub-class of hyper-parameter optimization problems, where the hyper-gradients are not available. Such problems frequently appear when the performance metric is non-differentiable or the hyper-parameter is not continuous. However, existing algorithms, like Bayesian optimization and reinforcement learning, often get trapped in local optimals with poor performance. To address the above limitations, we propose to use cubic regularization to accelerate convergence and avoid saddle points. First, we adopt stochastic relaxation, which allows obtaining gradient and Hessian information without hyper-gradients. Then, we exploit the rich curvature information by cubic regularization. Theoretically, we prove that the proposed method can converge to approximate second-order stationary points, and the convergence is also guaranteed when the lower-level problem is inexactly solved. Experiments on synthetic and real-world data demonstrate the effectiveness of our proposed method.
Author Information
Zhenqian Shen (Tsinghua University, Tsinghua University)
Hansi Yang (The Hong Kong University of Science and Technology)
Yong Li (Tsinghua University)
James Kwok (Hong Kong University of Science and Technology)
Quanming Yao (Tsinghua University)
More from the Same Authors
-
2021 Spotlight: TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation »
Haoang Chi · Feng Liu · Wenjing Yang · Long Lan · Tongliang Liu · Bo Han · William Cheung · James Kwok -
2023 Poster: Combating Bilateral Edge Noise for Robust Link Prediction »
Zhanke Zhou · Jiangchao Yao · Jiaxu Liu · Xiawei Guo · Quanming Yao · LI He · Liang Wang · Bo Zheng · Bo Han -
2023 Poster: DeepACO: Neural-enhanced Ant Systems for Combinatorial Optimization »
Haoran Ye · Jiarui Wang · Zhiguang Cao · Helan Liang · Yong Li -
2023 Poster: Nonparametric Teaching for Multiple Learners »
Chen Zhang · Xiaofeng Cao · Weiyang Liu · Ivor Tsang · James Kwok -
2022 Poster: Multi-Objective Deep Learning with Adaptive Reference Vectors »
Weiyu Chen · James Kwok -
2021 Poster: Progressive Feature Interaction Search for Deep Sparse Network »
Chen Gao · Yinfeng Li · Quanming Yao · Depeng Jin · Yong Li -
2021 Poster: Effective Meta-Regularization by Kernelized Proximal Regularization »
Weisen Jiang · James Kwok · Yu Zhang -
2021 Poster: Automorphic Equivalence-aware Graph Neural Network »
Fengli Xu · Quanming Yao · Pan Hui · Yong Li -
2021 Poster: TOHAN: A One-step Approach towards Few-shot Hypothesis Adaptation »
Haoang Chi · Feng Liu · Wenjing Yang · Long Lan · Tongliang Liu · Bo Han · William Cheung · James Kwok -
2020 Poster: Timeseries Anomaly Detection using Temporal Hierarchical One-Class Network »
Lifeng Shen · Zhuocong Li · James Kwok -
2020 Poster: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS »
Han Shi · Renjie Pi · Hang Xu · Zhenguo Li · James Kwok · Tong Zhang -
2020 Poster: Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering »
Jingtao Ding · Yuhan Quan · Quanming Yao · Yong Li · Depeng Jin -
2019 Poster: Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback »
Shuai Zheng · Ziyue Huang · James Kwok -
2019 Poster: Normalization Helps Training of Quantized LSTM »
Lu Hou · Jinhua Zhu · James Kwok · Fei Gao · Tao Qin · Tie-Yan Liu -
2018 Poster: Scalable Robust Matrix Factorization with Nonconvex Loss »
Quanming Yao · James Kwok -
2018 Poster: Co-teaching: Robust training of deep neural networks with extremely noisy labels »
Bo Han · Quanming Yao · Xingrui Yu · Gang Niu · Miao Xu · Weihua Hu · Ivor Tsang · Masashi Sugiyama -
2015 Poster: Fast Second Order Stochastic Backpropagation for Variational Inference »
Kai Fan · Ziteng Wang · Jeff Beck · James Kwok · Katherine Heller -
2012 Poster: Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification »
Wei Bi · James Kwok -
2009 Poster: Accelerated Gradient Methods for Stochastic Optimization and Online Learning »
Chonghai Hu · James Kwok · Weike Pan