References: Retrieval-Augmented Generation Optimization
-
Retrieval-augmented generation - Wikipedia - Comprehensive overview of RAG architecture, retrieval techniques, and known failure modes that frames the cost-optimization treatment in this chapter.
-
Vector database - Wikipedia - Coverage of approximate nearest-neighbor search, indexing techniques, and the database category that powers RAG retrieval.
-
Information retrieval - Wikipedia - The broader field including precision, recall, and reranking concepts that the chapter applies to RAG cost-quality tuning.
-
Hands-On Large Language Models - Jay Alammar and Maarten Grootendorst - O'Reilly - The retrieval and RAG chapters provide implementation-focused treatment that pairs with this chapter's cost-optimization framing.
-
AI Engineering - Chip Huyen - O'Reilly - The retrieval chapters address evaluation, chunking, and reranking with the production-systems perspective that informs this chapter's recommendations.
-
Pinecone Learning Center - Pinecone - Curated tutorials on chunking, embedding selection, and reranking from the team behind one of the leading vector databases; particularly strong on retrieval-quality tradeoffs.
-
LangChain RAG Documentation - LangChain - Practical tutorials on building RAG pipelines including the chunking, retrieval, and context-injection stages this chapter optimizes.
-
LlamaIndex Documentation - LlamaIndex - Reference for the data-framework-for-LLMs library used in this chapter's examples for chunking strategies and retrieval evaluation.
-
Anthropic RAG Cookbook - Anthropic GitHub - Working notebooks demonstrating RAG patterns specifically tuned for Claude including contextual retrieval and reranking.
-
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Lewis et al. (arXiv) - The 2020 paper that named and defined RAG; short and worth reading once for primary-source grounding before optimizing.