Attention Scaling Lab

Measure how self-attention latency explodes as you stretch the context window. Run the vanilla O(N²) kernel, then flip on optimizations like FlashAttention-inspired tiling or sparse patterning to see how much strain they relieve.

Sequence length (tokens): 1024 Larger values dramatically increase computation and memory. Browser execution is chunked to stay responsive.

Hidden dimension (d_model): 64

Select kernels to benchmark:

Optimized kernels simulate reduced inner loops using the same randomness so results are comparable.

Heaviest kernel

—

Peak latency

—

Awaiting benchmark

Memory at max length

—

Awaiting benchmark

Benchmark log ready. Press “Run Benchmarks” to begin.

Note: This lab uses simplified models for illustrative purposes and does not reflect exact hardware benchmarks.