SSMs

Materials:

Date: Thursday, 12-December-2024, 1.30pm, IST.

Pre-work:

In-Class

  • slides Efficiently Modeling Long Sequences with Structured State Spaces by Albert Gu
  • slides by Alan Milligan, UBC MLRG Summer 2024
  • The Annotated S4 blog | slides Generating Extremely Long Sequences in JAX, by Sasha Rush and Sidd Karamcheti | talk
  • blog Introduction to State Space Models by Loick Bourdois
  • blog - a Visual Guide to Mamba and State Space Models
  • Chapter 7 of Aaraon R. Voelker’s Thesis

Lab

Post-class:

  • Albert Gu’s blogs on S4 Part-1, Part-2 and Part-3
  • #46 MLSys @ Stanford Talk by Albert Gu
  • The Annotated S4 blog | code Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention, Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret
  • paper
  • code - Official implementation of S4
  • code another minimal implementation of Mamba
  • blog History of SSMs in 2022
  • blog A Mamba No. 5, with explanations of hardware accelerations
  • youtube a great explanation of Mamba

Additional References

  • LMU Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks, Aaraon R. Voelker et al, NeurIPS 2019.
  • Thesis Dynamical Systems in Spiking Neuromorphic Hardware. Chapter 7 on Delay Networks is very useful to understand how the HiPPo matrix gets derived.
  • paper Efficiently Modeling Long Sequences with Structured State Spaces, Albert Gu, Karan Goel, Christopher Ré, 2021
  • paper Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers, Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré, 2022
  • paper
  • paper How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections, Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré
  • paper Diagonal State Spaces are as Effective as Structured State Spaces, Ankit Gupta, Albert Gu, Jonathan Berant
  • paper On the Parameterization and Initialization of Diagonal State Space Models, Albert Gu, Ankit Gupta, Karan Goel, Christopher Ré
  • paper S4ND: Modeling Images and Videos as Multidimensional Signals Using State Spaces, Eric Nguyen, Karan Goel, Albert Gu, Gordon W. Downs, Preey Shah, Tri Dao, Stephen A. Baccus, Christopher Ré
  • paper Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality, Tri Dao, Albert Gu
  • paper Efficient Parallelization of a Ubiquitous Sequential Computation, Franz A. Heinsen | code