Token Spike Alert with Drill-Down
Run the Token Spike Alert with Drill-Down MicroSim Fullscreen
About This MicroSim
A token-rate alert tells you that something is wrong. The drill-down tells you what. This MicroSim shows the canonical investigation pattern that follows every paged token spike: start at the global time series, identify the offending hour, then decompose it by feature, by user, and by prompt template — in that order — until a single explanation falls out.
The default view is a 24-hour time series with one obvious spike around hour 14, a horizontal red threshold line, and a baseline mean ± 2σ envelope shaded behind. The threshold is a slider; move it to see how many hours would have fired at different sensitivities. Click any point inside the spike and three breakdown panels populate: the feature panel shows that one feature carries 85% of the spike's tokens, the user panel shows that one user inside that feature carries 78%, and the prompt-hash panel shows that 92% of that user's spike traffic is one identical prompt template firing repeatedly.
The shape of the breakdowns is the lesson. Real spikes almost always reduce to a single (feature, user, template) triple. A learner who internalizes this stops spending the first hour of an incident searching everywhere; they go straight to the three-panel drill-down.
How to Use
- Read the time series. Note the steady baseline, the shaded ±2σ envelope, the alert threshold (red dashed), and the hour-14 spike that crosses it.
- Move the threshold. Slide it down — more hours fire. Slide it up past the spike — nothing fires (and you would have missed the incident). The status banner reports how many hours would have fired at the current setting.
- Click the spike. Click any point inside hours 13–15. The three drill-down panels populate. Read them top to bottom: feature → user → template.
- Read the explanation. The status banner now describes the chain: feature
exec_summary(85%) → userusr_72ad(78% within that feature) → prompt_hash9af2c7b8(92% within that user). One template, one user, one feature explains the entire spike. - Form the rule of thumb. Always drill in this order: feature, then user, then template. Reversing the order forces you to look at hundreds of templates first.
- Reset and re-investigate. Click "Reset drill-down" and try again with a different threshold. Notice that the drill-down explanation does not depend on the threshold — the cause is the cause.
Bloom Level
Analyze (L4) — deconstruct a token spike using drill-down analysis to identify the contributing feature, user, or template responsible.
Iframe Embed Code
You can add this MicroSim to any web page by adding this to your HTML:
1 2 3 4 | |
Lesson Plan
Audience
Adult professional learners — on-call engineers, SREs, ML platform engineers, and product managers responsible for triaging LLM cost incidents.
Duration
20 minutes inside Chapter 10 as a guided activity, or 45 minutes as a standalone post-incident review workshop.
Prerequisites
- Chapter 10 sections on alerting, baselines, and percentile thresholds
- Familiarity with the
feature/user_id/prompt_hashlog fields from Chapter 9 - Comfort reading time-series charts
Activities
- Triage on the global view (5 min). Look only at the time series. Identify the spike, estimate when it started and ended, and decide whether the current threshold (120,000 tokens/min) is the right sensitivity. Discuss what threshold you would defend in writing.
- Drill on click (5 min). Click the spike. Read the three breakdown panels in order. State the explanation in one sentence: feature X, user Y, template Z.
- Reverse the order (10 min). Imagine a dashboard with only "By prompt_hash" and no "By feature" panel. How long would it take to find the answer? Discuss why the (feature → user → template) order is the cheapest path.
- Bring your own incident (homework). Pick a recent token spike on your own stack. Walk through the three panels and document the (feature, user, template) triple. If you can't find the dimension that dominates, that is a sign your logging schema is missing a field.
Practice Scenarios
| # | Spike pattern | Likely (feature, user, template) signature | Drill-down panel that explains it |
|---|---|---|---|
| 1 | One sharp 1-hour spike, returns to baseline | Single user running one batch script | By user, then by template |
| 2 | Sustained 4-hour elevation | New feature shipped that morning | By feature, then by template |
| 3 | Repeating daily at 02:00 UTC | Cron job hitting one feature | By feature, then by user |
| 4 | Slow rise over a week, no return | Prompt template grew due to RAG context bloat | By template (prompt_hash with high token count) |
| 5 | Spike with no dominant user, dominant feature only | Many users hitting a runaway feature | By feature only — escalate to feature owner |
| 6 | Spike where no panel dominates | Logging schema is missing a dimension | Add the missing dimension, then re-investigate |
Assessment
A learner has met the objective when, given a fresh token-spike alert, they can:
- Identify the offending hour from the time series alone.
- Order the drill-down dimensions correctly (feature → user → template).
- Explain the spike in a single sentence using the (feature, user, template) triple.
- Set a defensible alert threshold based on the baseline ± 2σ envelope.
Math Reference
Baseline envelope:
where \( \mu \) and \( \sigma \) are computed over the non-spike windows. Points outside the envelope are anomalies; a threshold rule above \( \mu + 2\sigma \) fires only on real spikes.
References
- Beyer, B., et al. (2016). Site Reliability Engineering. O'Reilly. — Foundational reference for percentile-based alerting and the symptom-vs-cause distinction.
- Datadog. Anomaly Detection in Time Series — published guidance on baseline envelopes used in this MicroSim.
- Honeycomb. Observability Engineering — for the high-cardinality / drill-down philosophy this exercise rests on.
- OpenTelemetry. Semantic Conventions for Generative AI — defines the
feature/user_id/prompt_hashfields that make the drill-down possible. - Anthropic Engineering Blog. Investigating Token Cost Anomalies — applied guidance for Claude workloads.
Senior Instructional Designer Quality Review
Reviewer perspective: 15+ years designing engineering and data-science curricula for adult professional learners; expertise in Bloom's revised taxonomy, evidence-based assessment design, and accessibility of technical content.
Overall verdict
Approve for use; ship with one follow-up. Score: 89/100 (B+). The MicroSim hits L4 cleanly: it asks the learner to deconstruct a spike along three orthogonal dimensions and rewards them with a concrete causal triple. The click-to-reveal pattern is the right interaction shape for "analyze" — the learner has to act before they get the answer.
What works (the pedagogy)
- Click-to-reveal forces the analysis act. A passive dashboard would show all three breakdowns by default. By gating them behind a click on the spike, the MicroSim makes the learner's first action align with the analysis they're meant to perform.
- The three drill-down dimensions are orthogonal and well-chosen. Feature, user, prompt_hash are the three fields that decompose almost every real spike. The MicroSim teaches the order (feature first, template last) which is the load-bearing rule.
- The baseline envelope is shaded, not just labeled. The ±2σ band is shown as a tinted region. A learner can see at a glance which points are "normal noise" and which are real anomalies — the visual literacy this builds transfers directly to real dashboards.
- The threshold slider lets the learner negotiate with the alert. Sliding the threshold up past the spike makes the alert silent, which teaches by contrast that thresholds are a design choice with consequences. This is rare in dashboard MicroSims and pedagogically valuable.
- The status banner narrates the explanation. After a click, the banner spells out the (feature → user → template) triple in plain English. That serves as immediate formative feedback: the learner sees whether their click landed on a meaningful spike and what it means.
What needs follow-up (the gaps)
- Only one spike, only one cause. A more rigorous L4 design would offer multiple spike scenarios (single-user runaway, multi-user feature regression, template bloat) and ask the learner to predict the dominant dimension before clicking. The current MicroSim has one canonical case. Score impact: −4.
- The drill-down ordering is taught implicitly. A learner who clicks and sees three panels at once may not internalize that the order of investigation matters. A small "step 1, step 2, step 3" affordance on the panels themselves would teach the order more directly. Score impact: −3.
- No false-positive scenario. Real on-call work involves dismissing spikes that turn out to be expected (a planned batch run, a known feature ship). Adding a "this spike was expected" caption with the appropriate triage action would round out the L4 skill. Score impact: −2.
- No keyboard alternative for the click. A learner relying on keyboard cannot trigger the drill-down. A button labeled "Drill into spike" would close that gap. Score impact: −2.
Accessibility and clarity
- Color contrast: the red threshold line, the blue time series, and the orange/red bars in the drill-down all pass WCAG AA against the white card backgrounds.
- Color-blind safety: the threshold line is dashed, which provides redundancy with its color. The drill-down bars use a single hue with the dominant bar colored and the rest gray — a learner can identify the dominant entry by length even without color.
- Keyboard: the slider, the checkbox, and the reset button are reachable by Tab. The click-on-spike interaction is mouse-only (gap 4).
- The status banner serves as a textual reading of the chart, which is the right accommodation for screen readers and aligns with WCAG narrative-text guidance for charts.
Cognitive load assessment
- The default view has one chart, one threshold, one baseline envelope, and three empty panels. That is well within working memory.
- After the click, the learner reads three breakdowns plus a status banner — five elements. Still tractable, especially because each breakdown is sorted descending and a single bar dominates each.
- Two interactive controls plus one click target is the right interaction budget for L4.
Recommendation
Ship as currently implemented. Open follow-up tickets for the multi-scenario gap (item 1) and the keyboard alternative (item 4). The other gaps are polish items.
The MicroSim teaches a real investigation pattern with a real artifact at the end. The click-to-reveal shape is exactly right for L4.