NLP Pipeline Architecture
This MicroSim shows the layered architecture of a complete natural language processing pipeline. Raw user text enters at the top and is progressively enriched as it passes through four processing layers, emerging at the bottom as a structured representation ready for intent recognition and query execution.
About This Diagram
The diagram is organized as horizontal swim lanes, one per processing layer. A blue gradient from light to dark signals increasing linguistic sophistication, while orange arrows mark the data transformation that happens between layers. Hover any stage to read what it contributes.
Interactive Demo
To embed this MicroSim in your own page, use the following iframe:
1 | |
How It Works
Each layer adds a different kind of structure:
- Layer 1 - Text Preprocessing (Character Level): normalization and tokenization turn a messy string into clean tokens.
- Layer 2 - Morphological Analysis (Word Level): stemming and lemmatization reduce words to their root forms.
- Layer 3 - Syntactic Analysis (Word Level): POS tagging and dependency parsing recover grammatical structure.
- Layer 4 - Semantic Analysis (Sentence Level): named entity recognition and coreference resolution capture meaning and entity relationships.
The label on each arrow shows what the previous layer hands to the next: normalized tokens, root forms, grammatical tags, and entity relationships.
Lesson Plan
- Order the layers. Ask students why morphology must precede syntax, and syntax must precede most semantics.
- Trace the example. Follow "last quarter sales" through each layer and predict what annotation gets added at each step.
- Map the levels. Connect the "Character / Word / Sentence" labels to the unit of analysis at each layer.
- Discuss tradeoffs. Compare stemming vs. lemmatization and when each is preferred in a production chatbot.