Chapter 2 Quiz: Search Technologies and Indexing

Test your understanding of search technologies and indexing concepts covered in this chapter.

Question 1

What is the primary purpose of a search index?

To store raw documents in their original format
To enable fast retrieval of documents based on search terms
To delete duplicate content from a database
To compress files for storage efficiency

Show Answer

The correct answer is B.

A search index is a data structure designed to enable fast retrieval of documents based on search terms. It maps terms to the documents containing them, allowing search engines to quickly find relevant results without scanning every document. Option A describes document storage rather than indexing, option C is about deduplication, and option D relates to compression rather than search functionality.

Question 2

Which component is essential to an inverted index?

A list of documents ordered by creation date
A mapping from terms to documents containing those terms
A compression algorithm for text storage
A user authentication system

Show Answer

The correct answer is B.

An inverted index maps terms to the documents that contain them. This is the fundamental structure that makes efficient text search possible. Instead of searching through each document, the search engine can look up a term in the inverted index and immediately find all documents containing that term. Options A, C, and D describe other system components but are not essential to an inverted index.

Question 3

In Boolean search, what does the AND operator do?

Returns documents containing either of the search terms
Returns documents containing all of the search terms
Excludes documents containing the specified terms
Ranks documents by relevance score

Show Answer

The correct answer is B.

In Boolean search, the AND operator returns only documents that contain all of the specified search terms. This narrows the search results. The OR operator (option A) returns documents with either term, the NOT operator (option C) excludes terms, and option D describes relevance ranking rather than Boolean logic.

Question 4

What does TF-IDF measure?

The total file size of indexed documents
The importance of a term in a document relative to a collection
The time required to process a search query
The number of unique words in a document

Show Answer

The correct answer is B.

TF-IDF (Term Frequency-Inverse Document Frequency) measures the importance of a term in a document relative to a collection of documents. It increases with term frequency in the document but decreases with the term's frequency across all documents, helping identify terms that are particularly relevant to specific documents. Options A, C, and D describe different metrics unrelated to term importance.

Question 5

Which search type can understand queries beyond exact keyword matches?

Keyword search
Boolean search
Full-text search
Semantic search (covered in the next chapter)

Show Answer

The correct answer is D.

While keyword search, Boolean search, and full-text search rely on exact or partial string matching, semantic search (which we'll cover in Chapter 3) can understand the meaning behind queries and find relevant results even when exact keywords don't match. The traditional search approaches in this chapter are limited to matching the actual text.

Question 6

What was the original purpose of the PageRank algorithm?

To rank web pages by their importance based on link structure
To count the number of pages on a website
To optimize page loading speed
To identify duplicate web pages

Show Answer

The correct answer is A.

PageRank was developed by Google founders to rank web pages based on their importance, which is determined by analyzing the link structure of the web. Pages with more high-quality links pointing to them are considered more important. Options B, C, and D describe other web-related tasks but not the purpose of PageRank.

Question 7

What is the main advantage of full-text search over simple keyword search?

It searches only document titles
It can search the entire content of documents and support features like wildcards and phrase matching
It requires less storage space
It only works with numeric data

Show Answer

The correct answer is B.

Full-text search examines the entire content of documents and supports advanced features like wildcards, phrase matching, and proximity searches. This is more powerful than simple keyword search, which may only match exact terms. Option A would be more limited than keyword search, option C is incorrect (full-text search typically requires more resources), and option D is false.

Question 8

In TF-IDF, what does a high IDF (Inverse Document Frequency) value indicate?

The term appears in almost every document
The term is rare across the document collection
The term has many characters
The term appears frequently within a single document

Show Answer

The correct answer is B.

A high IDF value in TF-IDF indicates that a term is rare across the document collection, making it more distinctive and potentially more important for identifying relevant documents. Common terms that appear in many documents have low IDF values. Option A would result in a low IDF, option C relates to term length (irrelevant to IDF), and option D describes term frequency (TF) rather than inverse document frequency.

Question 9

Which search operator would you use to exclude results containing a specific term?

AND
OR
NOT
MAYBE

Show Answer

The correct answer is C.

In Boolean search, the NOT operator is used to exclude documents containing a specific term from the search results. For example, "cats NOT dogs" would return documents about cats but exclude any that also mention dogs. AND (option A) requires all terms, OR (option B) includes documents with any term, and MAYBE (option D) is not a standard Boolean operator.

Question 10

What is the primary data structure used to enable fast keyword searches in large document collections?

Linear array
Hash table
Inverted index
Binary tree

Show Answer

The correct answer is C.

The inverted index is the primary data structure that enables fast keyword searches in large document collections. It maps each unique term to a list of documents containing that term, allowing search engines to quickly find relevant documents without scanning the entire collection. While hash tables (option B) and binary trees (option D) may be used within the implementation, the inverted index is the key structure for search. A linear array (option A) would require inefficient sequential scanning.