Spectron: Logarithmic Attention with Spectral Filtering

Hazan, EladDao Phuc Quang, TriNguyen, Windsor2025-08-062025-08-062025-04-10https://theses-dissertations.princeton.edu/handle/88435/dsp01df65vc30qCausal self-attention has been the primary driving force behind contemporary machine learning advances in the last decade but suffers from quadratic time complexity in the sequence dimension, becoming prohibitively expensive for tasks involving extremely long sequence lengths. Several ”linear” attention variants have been proposed as a remedy to this problem but often fall short in terms of expressivity. In this work, we propose Spectron, a novel architecture that couples an associative scan with spectral filtering to approximate vanilla softmax attention in logarithmic time. Spectron outperforms all other state-of-the-art linear attention variants and unlocks a new class of algorithms involving associative scan operators that can potentially endow linear attention methods with much more expressive algorithms. An unfinished thesis, to be banished to the darkest depths of the Seeley G. Mudd Manuscript Library.en-USSpectron: Logarithmic Attention with Spectral FilteringPrinceton University Senior Theses