KV Cache Stress Test

Explore how autoregressive throughput collapses as the context window grows. Adjust model size, retrieval pruning, and batching to see the associated memory bandwidth and latency penalties.

8,192 tokens
Batch 1 (interactive)
Throughput
Bandwidth
KV Cache

Decode throughput

— tokens/s
Awaiting input

HBM bandwidth

— GB/s
Awaiting input

KV cache footprint

— GB
Awaiting input

Note: This lab uses simplified models for illustrative purposes. RAG reduction is a conceptual approximation.