Chapter 11 Quiz: NLP Pipelines and Processing

Test your understanding of NLP pipelines and text processing concepts covered in this chapter.

Question 1

What is an NLP pipeline?

A physical pipe for data transfer
A sequence of processing steps that transform raw text into structured information
A database query optimizer
A network routing protocol

Show Answer

The correct answer is B.

An NLP pipeline is a sequence of processing steps that transform raw text into structured information. Each step performs a specific task like tokenization, part-of-speech tagging, or entity recognition. Option A describes physical infrastructure, option C describes database optimization, and option D describes networking.

Question 2

What is text preprocessing?

Writing text before processing
The initial cleaning and normalization of raw text before analysis
Printing text on paper
Encrypting text data

Show Answer

The correct answer is B.

Text preprocessing is the initial step of cleaning and normalizing raw text before analysis. This includes removing unwanted characters, converting to lowercase, handling whitespace, and other normalization tasks to prepare text for further NLP processing. Options A, C, and D describe different activities unrelated to text preparation.

Question 3

What is tokenization in NLP?

User authentication with tokens
The process of breaking text into smaller units like words or sentences
Cryptocurrency transactions
Database indexing

Show Answer

The correct answer is B.

Tokenization in NLP is the process of breaking text into smaller units (tokens) such as words, sentences, or subwords. This is typically the first step in an NLP pipeline after preprocessing. Option A describes security tokens, option C describes blockchain, and option D describes database optimization.

Question 4

What is stemming?

The process of reducing words to their root or base form
Creating flowcharts
Organizing files in folders
Compressing images

Show Answer

The correct answer is A.

Stemming is the process of reducing words to their root or base form by removing suffixes. For example, "running," "runs," and "ran" might all be reduced to "run." This helps treat different forms of the same word as equivalent. Options B, C, and D describe unrelated activities.

Question 5

What is lemmatization?

Creating lists of items
The process of reducing words to their dictionary base form using linguistic rules
Sorting data alphabetically
Backing up databases

Show Answer

The correct answer is B.

Lemmatization is the process of reducing words to their dictionary base form (lemma) using vocabulary and linguistic rules. Unlike stemming, lemmatization produces actual words. For example, "better" would be lemmatized to "good." Options A, C, and D describe different operations.

Question 6

What is Part-of-Speech (POS) tagging?

Marking posts in a social media feed
The process of identifying the grammatical role of each word in a sentence
Tagging images with metadata
Creating hashtags for content

Show Answer

The correct answer is B.

Part-of-Speech tagging is the process of identifying the grammatical role of each word in a sentence (noun, verb, adjective, etc.). This linguistic information is valuable for many NLP tasks like parsing and entity recognition. Option A describes social media, option C describes image metadata, and option D describes hashtags.

Question 7

What is the main difference between stemming and lemmatization?

Stemming is faster but less accurate; lemmatization uses linguistic knowledge for better results
They are exactly the same
Stemming only works with English
Lemmatization is always faster

Show Answer

The correct answer is A.

Stemming is typically faster but cruder, using simple rules to chop off word endings. Lemmatization uses vocabulary and morphological analysis to produce actual dictionary words, making it more accurate but computationally expensive. Option B is false, option C is incorrect (stemming works with many languages), and option D is backwards.

Question 8

Which preprocessing step would convert "The QUICK Brown Fox" to "the quick brown fox"?

Tokenization
Stemming
Case normalization (lowercasing)
Lemmatization

Show Answer

The correct answer is C.

Case normalization, specifically lowercasing, is a text preprocessing step that converts all text to lowercase, making "The QUICK Brown Fox" become "the quick brown fox." This helps treat the same words in different cases as identical. Tokenization (option A) splits text, stemming (option B) reduces to stems, and lemmatization (option D) reduces to lemmas.

Question 9

Why is tokenization an important first step in NLP pipelines?

It encrypts the data
It breaks text into manageable units that can be processed individually
It translates text to another language
It compresses the text

Show Answer

The correct answer is B.

Tokenization is crucial because it breaks text into manageable units (tokens) that can be processed individually by subsequent steps in the NLP pipeline. Most NLP algorithms operate on tokens rather than raw text. Option A describes encryption, option C describes translation, and option D describes compression.

Question 10

In an NLP pipeline for a chatbot, which processing step would typically come first?

Part-of-speech tagging
Entity recognition
Text preprocessing and tokenization
Sentiment analysis

Show Answer

The correct answer is C.

Text preprocessing and tokenization typically come first in an NLP pipeline, as they prepare and structure the raw text for subsequent analysis. Part-of-speech tagging (option A), entity recognition (option B), and sentiment analysis (option D) all depend on having preprocessed and tokenized text.