Chapter 3 Quiz: Semantic Search and Quality Metrics
Test your understanding of semantic search and quality metrics covered in this chapter.
Question 1
What is the key difference between semantic search and keyword search?
- Semantic search uses Boolean operators while keyword search does not
- Semantic search understands meaning and context, not just exact word matches
- Semantic search is faster than keyword search
- Semantic search only works with numeric data
Show Answer
The correct answer is B.
Semantic search understands the meaning and context of queries, allowing it to find relevant results even when exact keywords don't match. Traditional keyword search relies on exact or partial string matching. Option A incorrectly describes Boolean search, option C is not necessarily true (semantic search often requires more computation), and option D is false.
Question 2
Which metric measures how similar two vectors are based on their direction?
- Euclidean distance
- Manhattan distance
- Cosine similarity
- Hamming distance
Show Answer
The correct answer is C.
Cosine similarity measures the similarity between two vectors based on the cosine of the angle between them, focusing on their direction rather than magnitude. This makes it ideal for comparing document vectors and embeddings in semantic search. Euclidean distance (option A) and Manhattan distance (option B) measure geometric distance, while Hamming distance (option D) is used for comparing strings.
Question 3
What does the precision metric measure in information retrieval?
- The total number of documents in the database
- The proportion of retrieved documents that are relevant
- The proportion of relevant documents that were retrieved
- The speed of the search algorithm
Show Answer
The correct answer is B.
Precision measures the proportion of retrieved documents that are actually relevant. It answers the question: "Of all the documents we returned, how many were relevant?" A high precision means few irrelevant results. Option C describes recall, option A describes collection size, and option D relates to performance rather than quality.
Question 4
What does the recall metric measure in information retrieval?
- The proportion of retrieved documents that are relevant
- The proportion of relevant documents that were retrieved
- The average position of relevant results
- The time taken to execute a query
Show Answer
The correct answer is B.
Recall measures the proportion of all relevant documents that were actually retrieved. It answers the question: "Of all the relevant documents that exist, how many did we find?" A high recall means we didn't miss many relevant results. Option A describes precision, option C relates to ranking metrics, and option D relates to performance.
Question 5
What is the F1 Score?
- The average of precision and recall
- The harmonic mean of precision and recall
- The product of precision and recall
- The maximum of precision and recall
Show Answer
The correct answer is B.
The F1 Score is the harmonic mean of precision and recall, providing a single metric that balances both concerns. The harmonic mean (not simple average) gives more weight to lower values, so a good F1 score requires both good precision and good recall. Option A (simple average) would be less stringent, while options C and D don't represent standard information retrieval metrics.
Question 6
In vector similarity, what does a cosine similarity of 1.0 indicate?
- The vectors are completely different
- The vectors are perpendicular
- The vectors point in exactly the same direction
- The vectors have the same magnitude
Show Answer
The correct answer is C.
A cosine similarity of 1.0 means the vectors point in exactly the same direction (angle of 0 degrees), indicating maximum similarity. A value of 0 would indicate perpendicular vectors (option B), and -1 would indicate opposite directions. Option D is incorrect because cosine similarity measures direction, not magnitude.
Question 7
Why is semantic search particularly useful for chatbots?
- It only works with structured data
- It can understand user intent even when phrasing varies
- It requires less computational power than keyword search
- It eliminates the need for a database
Show Answer
The correct answer is B.
Semantic search is valuable for chatbots because it can understand user intent even when users phrase questions differently. For example, "How do I reset my password?" and "I forgot my login credentials" express similar intents despite using different words. Option A is false (semantic search works with unstructured text), option C is incorrect (semantic search typically requires more computation), and option D is false.
Question 8
If a search system has high precision but low recall, what does this mean?
- Most returned results are relevant, but many relevant documents were missed
- Most relevant documents were found, but many irrelevant ones were also returned
- Both precision and recall are balanced
- The system is performing optimally
Show Answer
The correct answer is A.
High precision but low recall means that most returned results are relevant (few false positives), but many relevant documents were not retrieved (many false negatives). This is a conservative system that errs on the side of showing fewer results to maintain quality. Option B describes high recall but low precision, option C would indicate balanced metrics, and option D is incorrect since low recall is not optimal.
Question 9
What mathematical concept underlies vector similarity in semantic search?
- Boolean algebra
- Linear algebra and vector geometry
- Set theory only
- Graph theory
Show Answer
The correct answer is B.
Vector similarity is based on linear algebra and vector geometry. Documents and queries are represented as vectors in high-dimensional space, and similarity is measured using geometric concepts like cosine similarity. While set theory (option C) and graph theory (option D) have applications in information retrieval, vector similarity specifically relies on linear algebra. Boolean algebra (option A) relates to traditional Boolean search.
Question 10
When would you prioritize high recall over high precision?
- When you want to minimize false positives
- When you cannot afford to miss any relevant results
- When storage space is limited
- When users only want the top result
Show Answer
The correct answer is B.
High recall should be prioritized when you cannot afford to miss relevant results, even if it means accepting some irrelevant ones. For example, in medical diagnosis or legal discovery, missing important information could be critical. Option A describes prioritizing precision, option C relates to storage concerns, and option D suggests prioritizing precision for quality over recall for completeness.