A new state-space hybrid matches softmax attention quality with O(n) memory at long context.
Researchers at Google DeepMind published 'Hawk-2', a hybrid recurrent-attention architecture that matches dense transformers on perplexity and reasoning benchmarks at 32K context. The paper claims a 6x throughput improvement at long context with no quality regression. Code and 8B-parameter weights have been released on GitHub.