Campus users should disconnect from VPN to access senior theses, as there is a temporary disruption affecting VPN.
 

Publication:

Spectron: Logarithmic Attention with Spectral Filtering

Loading...
Thumbnail Image

Files

mn4560_written_final_report.pdf (1.34 MB)

Date

2025-04-10

Journal Title

Journal ISSN

Volume Title

Publisher

Research Projects

Organizational Units

Journal Issue

Access Restrictions

Abstract

Causal self-attention has been the primary driving force behind contemporary machine learning advances in the last decade but suffers from quadratic time complexity in the sequence dimension, becoming prohibitively expensive for tasks involving extremely long sequence lengths. Several ”linear” attention variants have been proposed as a remedy to this problem but often fall short in terms of expressivity. In this work, we propose Spectron, a novel architecture that couples an associative scan with spectral filtering to approximate vanilla softmax attention in logarithmic time. Spectron outperforms all other state-of-the-art linear attention variants and unlocks a new class of algorithms involving associative scan operators that can potentially endow linear attention methods with much more expressive algorithms. An unfinished thesis, to be banished to the darkest depths of the Seeley G. Mudd Manuscript Library.

Description

Keywords

Citation