References: Prompt Caching Patterns
-
Cache (computing) - Wikipedia - Foundational coverage of caching including hit/miss terminology, eviction policies, and TTLs that translate directly to LLM prompt caching covered in this chapter.
-
Memoization - Wikipedia - The function-result caching pattern that is the conceptual ancestor of LLM prompt caching; explains why deterministic prefixes are essential.
-
Cache replacement policies - Wikipedia - Coverage of LRU, LFU, and TTL-based eviction; relevant to understanding why cached LLM prefixes have a 5-minute or 1-hour lifetime.
-
Designing Data-Intensive Applications - Martin Kleppmann - O'Reilly - The chapters on caching and replication establish the broader systems-engineering perspective that informs cache-key design and invalidation strategies.
-
AI Engineering - Chip Huyen - O'Reilly - The chapters on inference optimization cover prompt caching across vendors with the same unit-economics framing this chapter uses.
-
Anthropic Prompt Caching Documentation - Anthropic - The authoritative reference for cache_control, breakpoints, TTLs, and the cache_creation_input_tokens / cache_read_input_tokens response fields covered in this chapter.
-
OpenAI Prompt Caching Guide - OpenAI - OpenAI's automatic prompt caching documentation including the prefix-matching threshold and cached-token reporting in the usage object.
-
Google Gemini Context Caching - Google - Reference for Gemini's explicit caching API including minimum cache sizes, TTL configuration, and the cost calculation for cached vs uncached requests.
-
Anthropic Engineering: Prompt Caching Announcement - Anthropic - The announcement post explaining the design rationale and worked examples of cache cost savings; useful for the strategic-decision material in this chapter.
-
LangChain LLM Caching - LangChain - Reference for application-layer caching that complements vendor-provided prompt caching; relevant to the cache-aware-routing material later in this chapter.