Attention Scaling Lab

Measure how self-attention latency explodes as you stretch the context window. Run the vanilla O(N²) kernel, then flip on optimizations like FlashAttention-inspired tiling or sparse patterning to see how much strain they relieve.

Larger values dramatically increase computation and memory. Browser execution is chunked to stay responsive.
Select kernels to benchmark:
Optimized kernels simulate reduced inner loops using the same randomness so results are comparable.

Heaviest kernel

Peak latency

Awaiting benchmark

Memory at max length

Awaiting benchmark
Benchmark log ready. Press “Run Benchmarks” to begin.

Note: This lab uses simplified models for illustrative purposes and does not reflect exact hardware benchmarks.