References: Sampling, Tokenization, and Embeddings
-
Byte pair encoding - Wikipedia - Detailed explanation of the BPE algorithm including merge rules, vocabulary construction, and worked examples. The single most important Wikipedia article for understanding why English and code tokenize so differently.
-
Word embedding - Wikipedia - Coverage of dense vector representations, training methods, and properties of the embedding space; foundation for the RAG and semantic-search material in Chapter 15.
-
Lexical analysis - Wikipedia - Broader CS context for tokenization including pre-tokenization, special tokens, and Unicode handling — useful for understanding the per-vendor tokenizer drift discussed in this chapter.
-
Natural Language Processing with Transformers (Revised Edition) - Lewis Tunstall, Leandro von Werra, and Thomas Wolf - O'Reilly - Chapter 4 on tokenizers gives an end-to-end implementation perspective; the book is co-authored by Hugging Face engineers who built the tools used in this chapter's examples.
-
Hands-On Large Language Models - Jay Alammar and Maarten Grootendorst - O'Reilly - Chapter 2 covers tokenization with side-by-side comparisons across vendors; Chapter 4 covers embeddings with the geometric intuition needed for RAG.
-
tiktoken GitHub Repository - OpenAI - The official OpenAI tokenizer library used throughout this chapter's code examples; README explains encoding models and gives count-tokens recipes.
-
Hugging Face Tokenizers Documentation - Hugging Face - Reference for the fast Rust-backed tokenizer library; useful for engineers comparing tokenizer behavior across model families.
-
The Illustrated Word2vec - Jay Alammar - Visual blog post explaining how words become vectors; the most accessible introduction to the embedding-space concept used in this chapter and Chapter 15.
-
OpenAI Embeddings Guide - OpenAI - Authoritative reference for embedding model APIs, dimensions, and use cases; the canonical source for the embedding cost-and-dimension numbers used in this chapter.
-
Neural Machine Translation of Rare Words with Subword Units - Sennrich, Haddow, Birch (arXiv) - The original 2016 paper introducing BPE for NLP; short (10 pages) and worth reading once for engineers who want the primary-source treatment of the algorithm.