Publication: Spectron: Logarithmic Attention with Spectral Filtering
Files
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Access Restrictions
Abstract
Causal self-attention has been the primary driving force behind contemporary machine learning advances in the last decade but suffers from quadratic time complexity in the sequence dimension, becoming prohibitively expensive for tasks involving extremely long sequence lengths. Several ”linear” attention variants have been proposed as a remedy to this problem but often fall short in terms of expressivity. In this work, we propose Spectron, a novel architecture that couples an associative scan with spectral filtering to approximate vanilla softmax attention in logarithmic time. Spectron outperforms all other state-of-the-art linear attention variants and unlocks a new class of algorithms involving associative scan operators that can potentially endow linear attention methods with much more expressive algorithms. An unfinished thesis, to be banished to the darkest depths of the Seeley G. Mudd Manuscript Library.