Running nonlinear RNNs for T steps takes O(T) time. Our construction, called LDStack, approximately runs them in O(log T) parallel time, and obtains arbitrarily low error via repetition. First, we show nonlinear RNNs can be approximated by a stack of multiple-input, multiple-output (MIMO) LDS. This replaces nonlinearity across time with nonlinearity along depth. Next, we show that MIMO LDS can be approximated by an average or a concatenation of single-input, multiple-output (SIMO) LDS. Finally, we present an algorithm for running (and differentiating) SIMO LDS in O(log T) parallel time. On long sequences, LDStack is much faster than traditional RNNs, yet it achieves similar accuracy in our experiments. Furthermore, LDStack is amenable to linear systems theory. Therefore, it improves not only speed, but also interpretability and mathematical tractability.
Shiva Kaul (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
2020 Spotlight: Linear Dynamical Systems as a Core Computational Primitive »
Wed Dec 9th 03:10 -- 03:20 PM Room Orals & Spotlights: Continual/Meta/Misc Learning