Attention Thrashing: Interactive Test
This Needle-in-Haystack test demonstrates the "Lost in the Middle" phenomenon. We embed specific quotes (needles) at 40%, 50%, and 60% positions. Short contexts retrieve perfectly; long contexts fail despite technically "seeing" everything.
Real needles (✅) are embedded at exactly 40%, 50%, and 60%. Fake needles (❌) use similar words but are guaranteed NOT to appear anywhere in the generated text.
Control Test: Tokens
Proves the model can retrieve. Expect 100% accuracy.
Thrashing Test: Tokens
Same task, but attention overwhelms. Accuracy collapses.
How to Test:
- Generate a test → Copy → Paste into ai.dev (or any LLM)
- Ask this question:
"Does this text contain the following five phrases? For each phrase, respond with YES and the approximate percentage position (e.g., 'YES at ~40%') or NO if not found: (1) 'The future has not been written yet.', (2) 'The destiny we create has not been written.', (3) 'No fate but what we make for ourselves.', (4) 'No future except what we make for ourselves.', (5) 'There is no destiny except the one we create.'" - Expected:
- ✅ Control: Perfect accuracy: 1) YES ~40% | 2) NO | 3) YES ~50% | 4) NO | 5) YES ~60%
- ❌ Thrashing: MAY miss needles, report wrong positions, or falsely detect fakes (#2, #4)
💡 The old-fashioned way: Click "Show Preview" and use Ctrl+F to search for any needle in the generated text.
What You're Testing
As context grows, transformers exhibit attention thrashing: wasting compute on irrelevant tokens while losing mid-context retrieval accuracy. Models "see" everything but focus on nothing.
Notice: Long-context responses are noticeably slower. Prefill latency grows quadratically with input length. Processing 128K tokens takes multiple seconds versus sub-second at short contexts.