Skip to content

List of MicroSims for Token Efficiency

This course has 44 interactive MicroSims to help students explore token-efficient LLM system design through hands-on simulations, dashboards, and pipeline diagrams.

  • A/B Test Outcome Decision Matrix

    A/B Test Outcome Decision Matrix

    Interactive decision tree for judging whether to ship an LLM A/B test treatment based on primary-metric improvement, guardrail-metric regression, and effect size.

  • Agent Budget Policy with Multiple Limits

    Agent Budget Policy with Multiple Limits

    Multi-meter dashboard showing tokens, tool calls, iterations, and wall-clock budgets firing under healthy and pathological agent scenarios.

  • Anthropic Prompt Caching Lifecycle

    Anthropic Prompt Caching Lifecycle

    Visualize per-request token breakdown and cumulative cost across a sequence of requests with and without Anthropic's prompt caching feature.

  • Batch Job Lifecycle

    Batch Job Lifecycle

    State diagram of an LLM batch job from submission through download, with idempotency, retry, and webhook vs polling notification paths.

  • BPE Tokenization Pipeline

    BPE Tokenization Pipeline

    Step through the four stages a string passes through to become a token sequence - Unicode normalize, pre-tokenize, byte-init, and merge.

  • Budget Hierarchy Roll-Up

    Budget Hierarchy Roll-Up

    Pyramid diagram showing how individual session costs roll up through PR, engineer, repo, and organization budgets, with at-risk highlighting at 75%/90%.

  • Burn Rate Monthly Forecast

    Burn Rate Monthly Forecast

    Daily and cumulative LLM spend with linear forecast extrapolation against a budget line, surfacing whether current spend is on track or over budget.

  • Cache Hit Rate Health

    Cache Hit Rate Health

    Four representative cache hit-rate patterns (healthy, sudden drop, slow erosion, sawtooth) with diagnostic captions and remediation hints.

  • Cheap-First Cascade with Escalation

    Cheap-First Cascade with Escalation

    Flowchart MicroSim showing the cheap-first cascade with quality-gate escalation to Sonnet and Opus, with live expected-cost calculation.

  • Context Window Budget Allocation Over a Long Session

    Context Window Budget Allocation Over a Long Session

    Stacked area chart showing how the context window is allocated across components over 200 turns, with compaction events visible as drops in conversation-history allocation.

  • Continuous Cost Operating Model

    Continuous Cost Operating Model

    Concentric rings (Daily / Weekly / Monthly / Quarterly) showing the activities, roles, and artifacts that sustain a long-term LLM cost-optimization program.

  • Conversation Message Structure

    Conversation Message Structure

    Watch a multi-turn dialogue accumulate input tokens turn by turn, and see why the system-prompt prefix is the prime caching target.

  • Cost Attribution Rollup

    Cost Attribution Rollup

    30-request sample rolled up by Request, Feature, User, and Outcome — same data, four lenses.

  • Cost Optimization Loop

    Cost Optimization Loop

    Circular workflow showing the full cost-optimization cycle from baseline measurement through report and back to the next baseline, with optional failure-path overlay.

  • Cost-Quality Pareto Frontier

    Cost-Quality Pareto Frontier

    Scatter plot of model configurations on cost vs quality with the Pareto frontier highlighted; sliders apply quality and cost constraints to find the survivor set.

  • Cross-Vendor Caching Comparison

    Cross-Vendor Caching Comparison

    Multi-line cumulative cost chart comparing Anthropic, OpenAI, and Gemini caching mechanics across 50 requests, with sliders for prefix size, cache lifetime, and request frequency.

  • Cross-Vendor Token Count Drift

    Cross-Vendor Token Count Drift

    Side-by-side comparison of how Anthropic, OpenAI, and Gemini tokenize the same input, with a live per-vendor cost calculation that highlights the cheapest option.

  • Embedding Space Concept

    Embedding Space Concept

    Visualize how semantically related words cluster in 2D, and use a "find nearest" tool to see why nearness in embedding space means semantic similarity.

  • Function Calling Loop with Tool Choice

    Function Calling Loop with Tool Choice

    Interactive sequence diagram showing the OpenAI function-calling round-trip and how each tool_choice setting (auto, none, required, specific) changes the loop shape and token cost.

  • Interactive Tokenizer Explorer

    Interactive Tokenizer Explorer

    Type any string and watch a tokenizer break it into chips, with live character / word / token counts and a tokenizer-family toggle.

  • Learning Graph Viewer

    Learning Graph Viewer

    Interactive viewer for exploring the course learning graph with search, category filtering, pan/zoom navigation, and live node/edge statistics.

  • LLM Logging Pipeline with Privacy Filters

    LLM Logging Pipeline with Privacy Filters

    Interactive Mermaid flowchart that walks an LLM call from application code through PII detection, redaction, cost computation, retention, and storage with a live JSONL log preview.

  • LLM Ops Dashboard Layout

    LLM Ops Dashboard Layout

    Interactive p5.js wireframe of a six-panel LLM observability dashboard with healthy and incident states, hoverable panel rationales, and an alert overlay that highlights firing panels.

  • Long-Term and Short-Term Memory Architecture

    Long-Term and Short-Term Memory Architecture

    Three-column diagram showing how short-term conversation turns flow through compaction into long-term memory files, with a per-turn input flow at the bottom.

  • OpenAI Token Usage Object Anatomy

    OpenAI Token Usage Object Anatomy

    Hover-labeled anatomy of an OpenAI Chat Completions response showing how each usage field maps to a billing category, including reasoning_tokens for o-series models.

  • Output Control Settings

    Output Control Settings

    Five histogram comparison of output token distributions under baseline, max_tokens, stop sequence, concise instruction, and all-combined configurations.

  • Pareto Analysis of Per-Feature Cost

    Pareto Analysis of Per-Feature Cost

    Interactive Chart.js Pareto chart showing 25 features sorted by monthly LLM cost with overlaid cumulative-share line and adjustable Pareto target threshold.

  • Precision/Recall Tradeoff for K Selection

    Precision/Recall Tradeoff for K Selection

    Plot retrieval precision and recall as functions of K with per-query cost overlaid, so learners can justify the K that balances quality and cost.

  • Privacy Compliance Pipeline

    Privacy Compliance Pipeline

    Vertical pipeline showing the layered privacy and compliance controls a single LLM request flows through, with framework annotations toggleable per regulation.

  • Prompt Anatomy Budget

    Prompt Anatomy Budget

    Stacked bar of prompt components against a budget overlay; sliders adjust component sizes and auto-actions trim aggressively.

  • Prompt Trim Before/After

    Prompt Trim Before/After

    Grouped horizontal bars comparing token counts per prompt section before and after prompt-engineering techniques, with monthly savings projection.

  • RAG Pipeline Cost Annotations

    RAG Pipeline Cost Annotations

    Horizontal RAG pipeline with live cost annotations at each stage; sliders adjust top-K and reranker N to surface the cost-leverage points.

  • Sample Size Calculator for LLM A/B Tests

    Sample Size Calculator for LLM A/B Tests

    Adjust effect size, baseline variance, statistical power, and significance level to see the required sample size update live, with optional CUPED variance reduction.

  • Sampling Parameter Explorer

    Sampling Parameter Explorer

    Adjust temperature and top-p, watch the candidate-token distribution reshape, then sample once or 100 times to see empirical vs theoretical frequencies.

  • Serial vs Parallel Tradeoff

    Serial vs Parallel Tradeoff

    Dual-axis bar chart of wall-clock time and total token cost as parallelism grows from 1 to 16, surfacing the parallel token penalty.

  • Session Token Accumulation

    Session Token Accumulation

    Stacked area chart showing how harness sessions accumulate tokens turn by turn — surfacing the quadratic growth of conversation history.

  • Skill Refactoring Before/After

    Skill Refactoring Before/After

    Per-step token cost of a Skill before and after script-delegation refactor; toggle each step to see the cumulative monthly-savings projection.

  • Stable Prefix / Volatile Suffix

    Stable Prefix / Volatile Suffix

    Drag the cache boundary on a segmented prompt; toggle each segment between stable and volatile; see cache eligibility update in real time.

  • Sync, Async, and Batch API Flow

    Sync, Async, and Batch API Flow

    Three side-by-side sequence diagrams comparing synchronous, asynchronous, and batch LLM API modes with cost and latency annotations.

  • Task to Skill Binding Flow

    Task to Skill Binding Flow

    Diagram of how a harness decomposes a user request into tasks and binds each to a Skill, with token-cost annotations comparing lazy load vs eager load.

  • Token Distribution Long Tail

    Token Distribution Long Tail

    Histogram of per-request input tokens with P50/P95/P99 markers plus cost-share-by-percentile-band, surfacing whether a workload is body-heavy or tail-heavy.

  • Token Lifecycle from Input to Output

    Token Lifecycle from Input to Output

    Step through how raw text becomes input tokens, gets processed once by the model, and is emitted as output tokens one at a time - the asymmetry that drives pricing.

  • Token Spike Alert with Drill-Down

    Token Spike Alert with Drill-Down

    Interactive Chart.js time series of tokens-per-minute over 24 hours with a click-to-reveal drill-down by feature, user, and prompt template that explains the cause of a spike.

  • Tool Use Loop with Cost Annotations

    Tool Use Loop with Cost Annotations

    Multi-turn Anthropic tool-use sequence diagram with cumulative token annotations showing why the system prompt and tool definitions are the highest-value cache targets.