References: Retrieval-Augmented Generation Optimization

Retrieval-augmented generation - Wikipedia - Comprehensive overview of RAG architecture, retrieval techniques, and known failure modes that frames the cost-optimization treatment in this chapter.
Vector database - Wikipedia - Coverage of approximate nearest-neighbor search, indexing techniques, and the database category that powers RAG retrieval.
Information retrieval - Wikipedia - The broader field including precision, recall, and reranking concepts that the chapter applies to RAG cost-quality tuning.
Hands-On Large Language Models - Jay Alammar and Maarten Grootendorst - O'Reilly - The retrieval and RAG chapters provide implementation-focused treatment that pairs with this chapter's cost-optimization framing.
AI Engineering - Chip Huyen - O'Reilly - The retrieval chapters address evaluation, chunking, and reranking with the production-systems perspective that informs this chapter's recommendations.
Pinecone Learning Center - Pinecone - Curated tutorials on chunking, embedding selection, and reranking from the team behind one of the leading vector databases; particularly strong on retrieval-quality tradeoffs.
LangChain RAG Documentation - LangChain - Practical tutorials on building RAG pipelines including the chunking, retrieval, and context-injection stages this chapter optimizes.
LlamaIndex Documentation - LlamaIndex - Reference for the data-framework-for-LLMs library used in this chapter's examples for chunking strategies and retrieval evaluation.
Anthropic RAG Cookbook - Anthropic GitHub - Working notebooks demonstrating RAG patterns specifically tuned for Claude including contextual retrieval and reranking.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Lewis et al. (arXiv) - The 2020 paper that named and defined RAG; short and worth reading once for primary-source grounding before optimizing.