Skip to yearly menu bar Skip to main content


Poster

Towards Exact Gradient-based Training on Analog In-memory Computing

Zhaoxian Wu · Tayfun Gokmen · Malte Rasch · Tianyi Chen

West Ballroom A-D #5907
[ ] [ Project Page ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Analog in-memory accelerators present a promising solution for energy-efficient training and inference of large vision or language models. While the inference on analog accelerators has been studied recently, the analog training perspective is under-explored. Recent studies have shown that the vanilla analog stochastic gradient descent (Analog SGD) algorithm {\em converges inexactly} and thus performs poorly when applied to model training on non-ideal devices. To tackle this issue, various analog-friendly gradient-based algorithms have been proposed, such as Tiki-Taka and its variants. Even though Tiki-Taka exhibits superior empirical performance compared to Analog SGD, it is a heuristic algorithm that lacks theoretical underpinnings. This paper puts forth a theoretical foundation for gradient-based training on analog devices. We begin by characterizing the non-convergence issue of Analog SGD, which is caused by the asymptotic error arising from asymmetric updates and gradient noise. Further, we provide a convergence analysis of Tiki-Taka, which shows its ability to exactly converge to a critical point and hence eliminates the asymptotic error.The simulations verify the correctness of the analyses.

Live content is unavailable. Log in and register to view live content