Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation
The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length,...
https://arxiv.org/abs/2602.00294