Timezone: »
Poster
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta · Albert Gu · Jonathan Berant
Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision, audio and video. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In an exciting result, Gu et al. (ICLR 2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.
Author Information
Ankit Gupta (Tel Aviv University)
Albert Gu (Stanford)
Jonathan Berant (Tel Aviv University)
More from the Same Authors
-
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2021 : Combining Recurrent, Convolutional, and Continuous-Time Models with Structured Learnable Linear State-Space Layers »
Isys Johnson · Albert Gu · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2023 Poster: Structured State Space Models for In-Context Reinforcement Learning »
Chris Lu · Yannick Schroecker · Albert Gu · Emilio Parisotto · Jakob Foerster · Satinder Singh · Feryal Behbahani -
2023 Poster: From Pixels to UI Actions: Learning to Follow Instructions via Graphical User Interfaces »
Peter Shaw · Mandar Joshi · James Cohan · Jonathan Berant · Panupong Pasupat · Hexiang Hu · Urvashi Khandelwal · Kenton Lee · Kristina N Toutanova -
2022 Spotlight: Lightning Talks 2A-4 »
Sarthak Mittal · Richard Grumitt · Zuoyu Yan · Lihao Wang · Dongsheng Wang · Alexander Korotin · Jiangxin Sun · Ankit Gupta · Vage Egiazarian · Tengfei Ma · Yi Zhou · Yishi Xu · Albert Gu · Biwei Dai · Chunyu Wang · Yoshua Bengio · Uros Seljak · Miaoge Li · Guillaume Lajoie · Yiqun Wang · Liangcai Gao · Lingxiao Li · Jonathan Berant · Huang Hu · Xiaoqing Zheng · Zhibin Duan · Hanjiang Lai · Evgeny Burnaev · Zhi Tang · Zhi Jin · Xuanjing Huang · Chaojie Wang · Yusu Wang · Jian-Fang Hu · Bo Chen · Chao Chen · Hao Zhou · Mingyuan Zhou -
2022 Spotlight: Diagonal State Spaces are as Effective as Structured State Spaces »
Ankit Gupta · Albert Gu · Jonathan Berant -
2022 : Panel Discussion: Opportunities and Challenges »
Kenneth Norman · Janice Chen · Samuel J Gershman · Albert Gu · Sepp Hochreiter · Ida Momennejad · Hava Siegelmann · Sainbayar Sukhbaatar -
2022 Poster: On the Parameterization and Initialization of Diagonal State Space Models »
Albert Gu · Karan Goel · Ankit Gupta · Christopher Ré -
2022 Poster: S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces »
Eric Nguyen · Karan Goel · Albert Gu · Gordon Downs · Preey Shah · Tri Dao · Stephen Baccus · Christopher Ré -
2021 Poster: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State Space Layers »
Albert Gu · Isys Johnson · Karan Goel · Khaled Saab · Tri Dao · Atri Rudra · Christopher Ré -
2021 : CommonsenseQA 2.0: Exposing the Limits of AI through Gamification »
Alon Talmor · Ori Yoran · Ronan Le Bras · Chandra Bhagavatula · Yoav Goldberg · Yejin Choi · Jonathan Berant -
2020 Poster: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Poster: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2020 Spotlight: Leap-Of-Thought: Teaching Pre-Trained Models to Systematically Reason Over Implicit Knowledge »
Alon Talmor · Oyvind Tafjord · Peter Clark · Yoav Goldberg · Jonathan Berant -
2020 Spotlight: HiPPO: Recurrent Memory with Optimal Polynomial Projections »
Albert Gu · Tri Dao · Stefano Ermon · Atri Rudra · Christopher Ré -
2020 Poster: From Trees to Continuous Embeddings and Back: Hyperbolic Hierarchical Clustering »
Ines Chami · Albert Gu · Vaggos Chatziafratis · Christopher Ré -
2020 Poster: No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems »
Nimit Sohoni · Jared Dunnmon · Geoffrey Angus · Albert Gu · Christopher Ré -
2018 Poster: Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction »
Roei Herzig · Moshiko Raboh · Gal Chechik · Jonathan Berant · Amir Globerson -
2018 Poster: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Spotlight: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Poster: Learning Compressed Transforms with Low Displacement Rank »
Anna Thomas · Albert Gu · Tri Dao · Atri Rudra · Christopher Ré