Timezone: »

BML: A High-performance, Low-cost Gradient Synchronization Algorithm for DML Training
Songtao Wang · Dan Li · Yang Cheng · Jinkun Geng · Yanshu Wang · Shuai Wang · Shu-Tao Xia · Jianping Wu

Tue Dec 04 07:45 AM -- 09:45 AM (PST) @ Room 210 #63

In distributed machine learning (DML), the network performance between machines significantly impacts the speed of iterative training. In this paper we propose BML, a new gradient synchronization algorithm with higher network performance and lower network cost than the current practice. BML runs on BCube network, instead of using the traditional Fat-Tree topology. BML algorithm is designed in such a way that, compared to the parameter server (PS) algorithm on a Fat-Tree network connecting the same number of server machines, BML achieves theoretically 1/k of the gradient synchronization time, with k/5 of switches (the typical number of k is 2∼4). Experiments of LeNet-5 and VGG-19 benchmarks on a testbed with 9 dual-GPU servers show that, BML reduces the job completion time of DML training by up to 56.4%.

Author Information

Songtao Wang (Tsinghua University)
Dan Li (Tsinghua University)
Yang Cheng (Tsinghua University)
Jinkun Geng (Tsinghua University)
Yanshu Wang (Tsinghua Univeristy)
Shuai Wang (Tsinghua University)
Shu-Tao Xia (Tsinghua University)
Jianping Wu (Tsinghua University)

More from the Same Authors