Welcome
Welcome to Token Efficiency — a practical, open-source intelligent textbook on measuring, analyzing, and reducing the cost of generative AI.
Every Token Counts — and Counting Is Fun
Hi, I'm Pemba — your guide. We're going to look at where the tokens in modern AI systems actually go, why your bill is what it is, and how to drive it down without sacrificing quality. Cheap systems are happy systems. Let's count some tokens.
About This Book
In many organizations today, the token usage costs of generative AI tools are becoming a dominant factor in operating expenses. A single poorly-designed prompt, a verbose system message, an unbounded context window, or an over-eager agent loop can multiply costs by ten or one hundred times without producing better outcomes. Yet very few engineers and managers have a rigorous, end-to-end understanding of where tokens come from, how they are billed, and how to systematically drive them down without hurting quality.
This course closes that gap. It begins with a clear, practical mental model of how large language models consume input and output tokens, and builds up — through pricing economics, vendor ecosystems, agentic harnesses, structured logging, dashboards, A/B testing, prompt engineering, prompt caching, RAG tuning, context window management, model routing, and agent budget policies — to a complete, defensible token-efficiency operating model you can run on your own systems.
The book is deliberately vendor-pluralistic. It covers the three dominant ecosystems that engineering teams encounter today — Anthropic Claude, OpenAI, and Google Gemini — side by side, so you can compare them on cost-quality tradeoffs and design vendor-neutral abstractions where appropriate.
What You'll Find Inside
- 20 chapters organized in prerequisite order, from token mechanics through capstone projects
- 475 interconnected concepts in a validated learning graph that guarantees no chapter introduces a term before its prerequisites
- 40+ interactive MicroSims — browser-based simulations for the tokenizer pipeline, cache hit-rate dynamics, the Pareto frontier, agent budget meters, and more
- End-of-chapter quizzes that test understanding at the appropriate Bloom's Taxonomy level
- Cited references linking to vendor documentation and authoritative sources
- Pemba the Red Panda, the book's pedagogical mascot, who shows up at the moments you need her most
Who This Book Is For
The primary audience:
- Software engineers building features on top of LLM APIs
- Machine learning engineers and platform engineers responsible for generative AI infrastructure
- Technical leads and engineering managers who own the cost and performance of AI features in production
- FinOps practitioners translating LLM bills into per-feature, per-user, and per-outcome unit economics
Secondary audience:
- Graduate students in computer science, data science, or information systems who want practical exposure to the economics of large language models
Prerequisites: working knowledge of one programming language (Python preferred), familiarity with REST APIs and JSON, basic command-line and Git skills, and conceptual exposure to LLMs at the level of "I have used ChatGPT, Claude, or a similar tool."
How to Use This Book
Use the navigation menu on the left to explore:
- Chapters — main educational content, designed to be read in order
- MicroSims — interactive simulations referenced throughout the chapters
- Case Studies — applied examples and worked dashboards
- Learning Graph — interactive concept-dependency visualization across all 475 concepts
- About — author bio, citation formats, license, and the motivation behind the book
If you're new to LLM economics, read the chapters in order — each one builds on the prior. If you're already familiar with the basics, skim Chapters 1–3 and jump to whichever optimization technique matches your current backlog.
Getting Started
Start with Chapter 1: LLMs, Tokens, and Generation Basics to install the mental model the rest of the book depends on. By the end of Chapter 3 you'll be able to read any LLM API call and predict its cost. By the end of Chapter 14 you'll know how to cache it. By the end of Chapter 18 you'll know how to budget it.
License
This work is released under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0). Source code, MicroSim implementations, and supporting scripts are available under the same terms in the project repository.
