(Originally a Sunday EXPO talk) Long training times are the biggest single factor constraining innovation in Deep Learning. To speed up training, significant progress has been made in improving the performance at the single device level (via optimized kernels for common models) and in scaling out training jobs (via new methods for large-batch training). But the former makes it harder to explore innovative research ideas, as performance becomes brittle to even minor changes in models, because those optimized kernels are rigid. And the latter requires extensive model tuning from machine learning researchers and hard to reproduce.
In this talk we will discuss Cerebras approach to speed up training and reduce time to solution with the Cerebras Wafer Scale Engine - the largest chip in the world. It provides cluster-scale resources on a single chip with full utilization for tensors of any shapes, fat, square and thin, dense and sparse, enabling researchers to explore novel network architectures and optimization techniques at any batch sizes.