Explore how autoregressive throughput collapses as the context window grows. Adjust model size, retrieval pruning, and batching to see the associated memory bandwidth and latency penalties.
Note: This lab uses simplified models for illustrative purposes. RAG reduction is a conceptual approximation.