SubQuadratic launches SubQ — first LLM with 12M token context at 1/20th cost

SubQuadratic has launched SubQ, the first frontier LLM built on a fully sub-quadratic sparse-attention architecture that supports a 12 million token context window. The model is 52x faster than FlashAttention at 1M tokens and costs less than 5% of Anthropic's Opus (~$1.50/M vs $15/M tokens) while achieving competitive performance on benchmarks like SWE-Bench Verified (81.8%). SubQ uses content-dependent sparse attention that focuses only on relevant token relationships, reducing compute by nearly 1,000x compared to standard transformers. This breakthrough could fundamentally reshape LLM economics by making long-context processing practical and affordable for enterprise applications.

Impact: 44

SubQuadratic launches SubQ — first LLM with 12M token context at 1/20th cost

0 Comments