SSMs
Materials:
Date: Thursday, 12-December-2024, 1.30pm, IST.
Pre-work:
In-Class
- slides Efficiently Modeling Long Sequences with Structured State Spaces by Albert Gu
- slides by Alan Milligan, UBC MLRG Summer 2024
- The Annotated S4 blog | slides Generating Extremely Long Sequences in JAX, by Sasha Rush and Sidd Karamcheti | talk
- blog Introduction to State Space Models by Loick Bourdois
- blog - a Visual Guide to Mamba and State Space Models
- Chapter 7 of Aaraon R. Voelker’s Thesis
Lab
- Mamba-Minimal walk through, a minimal implementation of Mamba
Post-class:
- Albert Gu’s blogs on S4 Part-1, Part-2 and Part-3
- #46 MLSys @ Stanford Talk by Albert Gu
- The Annotated S4 blog | code Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
- paper
- code - Official implementation of S4
- code another minimal implementation of Mamba
- blog History of SSMs in 2022
- blog A Mamba No. 5, with explanations of hardware accelerations
- youtube a great explanation of Mamba
Additional References
- LMU Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks, Aaraon R. Voelker et al, NeurIPS 2019.
- Thesis Dynamical Systems in Spiking Neuromorphic Hardware. Chapter 7 on Delay Networks is very useful to understand how the HiPPo matrix gets derived.
- paper Efficiently Modeling Long Sequences with Structured State Spaces, Albert Gu, Karan Goel, Christopher Ré, 2021
- paper Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers, Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré, 2022
- paper
- paper How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections, Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré
- paper Diagonal State Spaces are as Effective as Structured State Spaces, Ankit Gupta, Albert Gu, Jonathan Berant
- paper On the Parameterization and Initialization of Diagonal State Space Models, Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré
- paper S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces, Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré
- paper Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao, Albert Gu
- paper Efficient Parallelization of a Ubiquitous Sequential Computation, Franz A. Heinsen | code