Attention Scaling Lab

Measure how self-attention latency explodes as you stretch the context window. Run the vanilla O(N²) kernel, then flip on optimizations like FlashAttention-inspired tiling or sparse patterning to see how much strain they relieve.

Log scale (2^8 to 2^28). The simulation samples a smaller window and extrapolates.
Select kernels to benchmark:
Optimized kernels simulate reduced inner loops so comparisons stay apples-to-apples.

Heaviest kernel

Peak latency

Awaiting benchmark

Memory at max length

Awaiting benchmark
Benchmark log ready. Press “Run Benchmarks” to begin.
        

Note: simplified models for illustration; not hardware-accurate.