Chapter 11 Quiz: NLP Pipelines and Processing
Test your understanding of NLP pipelines and text processing concepts covered in this chapter.
Question 1
What is an NLP pipeline?
- A physical pipe for data transfer
- A sequence of processing steps that transform raw text into structured information
- A database query optimizer
- A network routing protocol
Show Answer
The correct answer is B.
An NLP pipeline is a sequence of processing steps that transform raw text into structured information. Each step performs a specific task like tokenization, part-of-speech tagging, or entity recognition. Option A describes physical infrastructure, option C describes database optimization, and option D describes networking.
Question 2
What is text preprocessing?
- Writing text before processing
- The initial cleaning and normalization of raw text before analysis
- Printing text on paper
- Encrypting text data
Show Answer
The correct answer is B.
Text preprocessing is the initial step of cleaning and normalizing raw text before analysis. This includes removing unwanted characters, converting to lowercase, handling whitespace, and other normalization tasks to prepare text for further NLP processing. Options A, C, and D describe different activities unrelated to text preparation.
Question 3
What is tokenization in NLP?
- User authentication with tokens
- The process of breaking text into smaller units like words or sentences
- Cryptocurrency transactions
- Database indexing
Show Answer
The correct answer is B.
Tokenization in NLP is the process of breaking text into smaller units (tokens) such as words, sentences, or subwords. This is typically the first step in an NLP pipeline after preprocessing. Option A describes security tokens, option C describes blockchain, and option D describes database optimization.
Question 4
What is stemming?
- The process of reducing words to their root or base form
- Creating flowcharts
- Organizing files in folders
- Compressing images
Show Answer
The correct answer is A.
Stemming is the process of reducing words to their root or base form by removing suffixes. For example, "running," "runs," and "ran" might all be reduced to "run." This helps treat different forms of the same word as equivalent. Options B, C, and D describe unrelated activities.
Question 5
What is lemmatization?
- Creating lists of items
- The process of reducing words to their dictionary base form using linguistic rules
- Sorting data alphabetically
- Backing up databases
Show Answer
The correct answer is B.
Lemmatization is the process of reducing words to their dictionary base form (lemma) using vocabulary and linguistic rules. Unlike stemming, lemmatization produces actual words. For example, "better" would be lemmatized to "good." Options A, C, and D describe different operations.
Question 6
What is Part-of-Speech (POS) tagging?
- Marking posts in a social media feed
- The process of identifying the grammatical role of each word in a sentence
- Tagging images with metadata
- Creating hashtags for content
Show Answer
The correct answer is B.
Part-of-Speech tagging is the process of identifying the grammatical role of each word in a sentence (noun, verb, adjective, etc.). This linguistic information is valuable for many NLP tasks like parsing and entity recognition. Option A describes social media, option C describes image metadata, and option D describes hashtags.
Question 7
What is the main difference between stemming and lemmatization?
- Stemming is faster but less accurate; lemmatization uses linguistic knowledge for better results
- They are exactly the same
- Stemming only works with English
- Lemmatization is always faster
Show Answer
The correct answer is A.
Stemming is typically faster but cruder, using simple rules to chop off word endings. Lemmatization uses vocabulary and morphological analysis to produce actual dictionary words, making it more accurate but computationally expensive. Option B is false, option C is incorrect (stemming works with many languages), and option D is backwards.
Question 8
Which preprocessing step would convert "The QUICK Brown Fox" to "the quick brown fox"?
- Tokenization
- Stemming
- Case normalization (lowercasing)
- Lemmatization
Show Answer
The correct answer is C.
Case normalization, specifically lowercasing, is a text preprocessing step that converts all text to lowercase, making "The QUICK Brown Fox" become "the quick brown fox." This helps treat the same words in different cases as identical. Tokenization (option A) splits text, stemming (option B) reduces to stems, and lemmatization (option D) reduces to lemmas.
Question 9
Why is tokenization an important first step in NLP pipelines?
- It encrypts the data
- It breaks text into manageable units that can be processed individually
- It translates text to another language
- It compresses the text
Show Answer
The correct answer is B.
Tokenization is crucial because it breaks text into manageable units (tokens) that can be processed individually by subsequent steps in the NLP pipeline. Most NLP algorithms operate on tokens rather than raw text. Option A describes encryption, option C describes translation, and option D describes compression.
Question 10
In an NLP pipeline for a chatbot, which processing step would typically come first?
- Part-of-speech tagging
- Entity recognition
- Text preprocessing and tokenization
- Sentiment analysis
Show Answer
The correct answer is C.
Text preprocessing and tokenization typically come first in an NLP pipeline, as they prepare and structure the raw text for subsequent analysis. Part-of-speech tagging (option A), entity recognition (option B), and sentiment analysis (option D) all depend on having preprocessed and tokenized text.