How Word2Vec, GloVe, and FastText Learn Embeddings

Word2Vec

Skip-gram: predicts context from a target word
Input word: foxone-hot encoded
Hidden / Embedding layer300 dimensions
Predict context wordsquick, brown, jumps
Objectivemaximize P(context | target)

GloVe

Factorizes global co-occurrence statistics
Co-occurrence matrixword-pair frequencies
Matrix factorizationdecompose statistics
Word + context vectorsthe embeddings
Objectivedot product = log co-occurrence

FastText

Combines subword (character n-gram) information
Input word: runningsplit into char n-grams
N-gram embeddingsrun, unn, nni, ing, running
Final embeddingaverage of n-gram vectors
Handles OOV wordsout-of-vocabulary support
Tip: hover over any box to learn how that step works.
FeatureWord2VecGloVeFastText
Training paradigmLocal context predictionGlobal statisticsSubword local context
OOV handlingNoNoYes
Training speedFastMediumFast
Memory efficiencyHighMedium (large matrix)Medium (n-grams)