Timezone: »

AC-GC: Lossy Activation Compression with Guaranteed Convergence
R David Evans · Tor Aamodt

Tue Dec 07 04:30 PM -- 06:00 PM (PST) @ Virtual #None

Parallel hardware devices (e.g., graphics processor units) have limited high-bandwidth memory capacity.This negatively impacts the training of deep neural networks (DNNs) by increasing runtime and/or decreasing accuracy when reducing model and/or batch size to fit this capacity. Lossy compression is a promising approach to tackling memory capacity constraints, but prior approaches rely on hyperparameter search to achieve a suitable trade-off between convergence and compression, negating runtime benefits. In this paper we build upon recent developments on Stochastic Gradient Descent convergence to prove an upper bound on the expected loss increase when training with compressed activation storage. We then express activation compression error in terms of this bound, allowing the compression rate to adapt to training conditions automatically. The advantage of our approach, called AC-GC, over existing lossy compression frameworks is that, given a preset allowable increase in loss, significant compression without significant increase in error can be achieved with a single training run. When combined with error-bounded methods, AC-GC achieves 15.1x compression with an average accuracy change of 0.1% on text and image datasets. AC-GC functions on any model composed of the layers analyzed and, by avoiding compression rate search, reduces overall training time by 4.6x over SuccessiveHalving.

Author Information

R David Evans (University of British Columbia)

I'm a PhD candidate in Computer Architecture examining architectures for efficient training. My current focus is using hardware accelerators to reduce the memory overheads associated with activation storage.

Tor Aamodt (University of British Columbia)

More from the Same Authors