Serial vs Parallel Tradeoff
About This MicroSim
A dual-axis bar chart comparing wall-clock time (left axis, blue) and total token cost (right axis, orange) as parallelism grows from serial (1) to ×16. The parallel token penalty surfaces immediately: time goes down, cost goes up — and the rate at which cost grows depends critically on whether caching is on. Caching makes the parallel penalty milder; without caching, every parallel agent re-pays the full system prompt.
How to Use
- Read the default. 8 subtasks, 8K system prompt, caching on. Note time goes from a long bar at serial to a short one at ×16, while cost grows modestly.
- Disable caching. Now the cost bars grow much more steeply with parallelism — that's the penalty without prefix sharing.
- Bump system prompt to 20K. Cost penalty steepens further; time changes minimally.
- Find the knee. Where does adding more parallelism stop reducing time meaningfully?
Bloom Level
Evaluate (L5) — judge whether a particular multi-task workload should run serially or in parallel based on cost, latency, and the parallel token penalty.
Iframe Embed Code
1 | |
Lesson Plan
Audience
Engineers building agentic systems with subagent dispatch (Claude Code, Codex, Antigravity).
Duration
10–15 minutes inside Chapter 7.
Prerequisites
Chapter 7 sections on Subagent Pattern, Serial Execution, Parallel Execution, Parallel Token Penalty.
Activities
- Calibrate (3 min). Read serial cost vs ×16 cost at default settings. Note the ratio.
- Caching disabled (5 min). Compare same scenario, caching off. Note ratio change.
- Decision rule (5 min). Build a rule: "for this workload type, use parallelism factor X because Y."
Practice Scenarios
| # | Workload | Parallelism choice | Why |
|---|---|---|---|
| 1 | 30 independent code lints, latency-critical | ×8 or ×16 | trade cost for latency |
| 2 | 30 independent code lints, latency-tolerant | serial | minimize cost |
| 3 | 4 chained reasoning steps with dependencies | serial | parallelism inapplicable |
| 4 | 16 retrievals for one agent task | ×16 with caching on | minimal penalty |
| 5 | 16 retrievals, caching off | serial or ×4 | caching off makes ×16 wasteful |
Assessment
Learner has met the objective when, given a workload description, they can recommend a parallelism factor and articulate the cost-vs-latency tradeoff.
References
- Anthropic Engineering — Subagent patterns.
- Chapter 7 — Subagent Pattern, Parallel Token Penalty.
- Designing Data-Intensive Applications (Kleppmann) — chapter on parallelism patterns.
Senior Instructional Designer Quality Review
Reviewer perspective: 15+ years designing engineering and distributed-systems curricula for adult professional learners.
Overall verdict
Approve as-is for Chapter 7. Score: 88/100 (B+). Dual-axis time-vs-cost is the right primitive for this comparison, and the caching toggle teaches the load-bearing variable.
What works
- Bloom alignment. L5 "judge" demands weighing competing dimensions; dual-axis chart externalizes both.
- Caching toggle as the load-bearing lever. Without it, students miss why parallelism is sometimes free and sometimes brutal.
- Knee-of-curve discoverable. As parallelism grows, time stops dropping but cost keeps rising.
Gaps
- No explicit "knee" annotation. Marking the parallelism factor where time-savings flatten would surface the L5 decision point. Score impact: −3.
- No latency-budget overlay. A "latency budget" line would let the learner read the minimum parallelism that meets a target. Score impact: −2.
- Single workload model. Independent subtasks only — chained workloads can't actually parallelize, and that's a relevant L5 decision. Score impact: −2.
Accessibility
Dual-axis colors (blue/orange) are color-blind safe. Status text reinforces.
Cognitive load
5 parallelism levels × 2 series + 4 controls. Tractable.
Recommendation
Approve. Open follow-up for knee annotation (gap 1).