Timezone: »

Learning to Mutate with Hypergradient Guided Population
Zhiqiang Tao · Yaliang Li · Bolin Ding · Ce Zhang · Jingren Zhou · Yun Fu

Tue Dec 08 09:00 PM -- 11:00 PM (PST) @ Poster Session 2 #715

Computing the gradient of model hyperparameters, i.e., hypergradient, enables a promising and natural way to solve the hyperparameter optimization task. However, gradient-based methods could lead to suboptimal solutions due to the non-convex nature of optimization in a complex hyperparameter space. In this study, we propose a hyperparameter mutation (HPM) algorithm to explicitly consider a learnable trade-off between using global and local search, where we adopt a population of student models to simultaneously explore the hyperparameter space guided by hypergradient and leverage a teacher model to mutate the underperforming students by exploiting the top ones. The teacher model is implemented with an attention mechanism and is used to learn a mutation schedule for different hyperparameters on the fly. Empirical evidence on synthetic functions is provided to show that HPM outperforms hypergradient significantly. Experiments on two benchmark datasets are also conducted to validate the effectiveness of the proposed HPM algorithm for training deep neural networks compared with several strong baselines.

Author Information

Zhiqiang Tao (Santa Clara University)
Yaliang Li (Alibaba Group)
Bolin Ding ("Data Analytics and Intelligence Lab, Alibaba Group")
Ce Zhang (ETH Zurich)
Jingren Zhou (Alibaba Group)
Yun Fu (Northeastern University)

More from the Same Authors