Timezone: »

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
Xiaodong Cui · Wei Zhang · Zoltán Tüske · Michael Picheny

Tue Dec 04 07:45 AM -- 09:45 AM (PST) @ Room 517 AB #120

We propose a population-based Evolutionary Stochastic Gradient Descent (ESGD) framework for optimizing deep neural networks. ESGD combines SGD and gradient-free evolutionary algorithms as complementary algorithms in one framework in which the optimization alternates between the SGD step and evolution step to improve the average fitness of the population. With a back-off strategy in the SGD step and an elitist strategy in the evolution step, it guarantees that the best fitness in the population will never degrade. In addition, individuals in the population optimized with various SGD-based optimizers using distinct hyper-parameters in the SGD step are considered as competing species in a coevolution setting such that the complementarity of the optimizers is also taken into account. The effectiveness of ESGD is demonstrated across multiple applications including speech recognition, image recognition and language modeling, using networks with a variety of deep architectures.

Author Information

Xiaodong Cui (IBM T. J. Watson Research Center)
Wei Zhang (IBM T.J.Watson Research Center)

BE Beijing Univ of Technology 2005 MSc Technical University of Denmark 2008 PhD University of Wisconsin, Madison 2013 All in computer science Published papers in ASPLOS, OOPSLA, OSDI, PLDI, IJCAI, ICDM, NIPS

Zoltán Tüske (IBM Research AI)
Michael Picheny (IBM T. J. Watson Research Center)

More from the Same Authors