Skip to content

References: Retrieval-Augmented Generation Optimization

  1. Retrieval-augmented generation - Wikipedia - Comprehensive overview of RAG architecture, retrieval techniques, and known failure modes that frames the cost-optimization treatment in this chapter.

  2. Vector database - Wikipedia - Coverage of approximate nearest-neighbor search, indexing techniques, and the database category that powers RAG retrieval.

  3. Information retrieval - Wikipedia - The broader field including precision, recall, and reranking concepts that the chapter applies to RAG cost-quality tuning.

  4. Hands-On Large Language Models - Jay Alammar and Maarten Grootendorst - O'Reilly - The retrieval and RAG chapters provide implementation-focused treatment that pairs with this chapter's cost-optimization framing.

  5. AI Engineering - Chip Huyen - O'Reilly - The retrieval chapters address evaluation, chunking, and reranking with the production-systems perspective that informs this chapter's recommendations.

  6. Pinecone Learning Center - Pinecone - Curated tutorials on chunking, embedding selection, and reranking from the team behind one of the leading vector databases; particularly strong on retrieval-quality tradeoffs.

  7. LangChain RAG Documentation - LangChain - Practical tutorials on building RAG pipelines including the chunking, retrieval, and context-injection stages this chapter optimizes.

  8. LlamaIndex Documentation - LlamaIndex - Reference for the data-framework-for-LLMs library used in this chapter's examples for chunking strategies and retrieval evaluation.

  9. Anthropic RAG Cookbook - Anthropic GitHub - Working notebooks demonstrating RAG patterns specifically tuned for Claude including contextual retrieval and reranking.

  10. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks - Lewis et al. (arXiv) - The 2020 paper that named and defined RAG; short and worth reading once for primary-source grounding before optimizing.