How Word2Vec, GloVe, and FastText Learn Embeddings
Word2Vec
Skip-gram: predicts context from a target word
Input word: foxone-hot encoded
↓
Hidden / Embedding layer300 dimensions
↓
Predict context wordsquick, brown, jumps
↓
Objectivemaximize P(context | target)
GloVe
Factorizes global co-occurrence statistics
Co-occurrence matrixword-pair frequencies
↓
Matrix factorizationdecompose statistics
↓
Word + context vectorsthe embeddings
↓
Objectivedot product = log co-occurrence
FastText
Combines subword (character n-gram) information
Input word: runningsplit into char n-grams
↓
N-gram embeddingsrun, unn, nni, ing, running
↓
Final embeddingaverage of n-gram vectors
↓
Handles OOV wordsout-of-vocabulary support
Tip: hover over any box to learn how that step works.
| Feature | Word2Vec | GloVe | FastText |
|---|---|---|---|
| Training paradigm | Local context prediction | Global statistics | Subword local context |
| OOV handling | No | No | Yes |
| Training speed | Fast | Medium | Fast |
| Memory efficiency | High | Medium (large matrix) | Medium (n-grams) |