Quiz: Machine Learning Foundations

Test your understanding of linear algebra concepts in machine learning.

1. In machine learning, a feature vector represents:

The output label of a data point
The numerical attributes of a single data sample
The weights of a neural network
The training algorithm

Show Answer

The correct answer is B. A feature vector contains the numerical attributes (features) that describe a single data sample. For example, an image might be represented as a vector of pixel values.

Concept Tested: Feature Vector

2. The weight matrix in a linear model maps:

Labels to features
Inputs to outputs through linear transformation
Features to labels through division
Errors to gradients

Show Answer

The correct answer is B. The weight matrix performs a linear transformation from input features to outputs. In \(\mathbf{y} = W\mathbf{x} + \mathbf{b}\), the matrix \(W\) determines how inputs combine to produce outputs.

Concept Tested: Weight Matrix

3. The gradient of a loss function with respect to parameters indicates:

The minimum value of the loss
The direction of steepest increase
The optimal parameters
The training data size

Show Answer

The correct answer is B. The gradient points in the direction of steepest increase of the loss function. To minimize loss, gradient descent moves in the opposite direction (negative gradient).

Concept Tested: Gradient

4. A cost function in machine learning measures:

The monetary expense of training
The discrepancy between predictions and true values
The number of features
The model complexity

Show Answer

The correct answer is B. The cost (or loss) function measures how far the model's predictions deviate from the true values. Training aims to minimize this function by adjusting parameters.

Concept Tested: Cost Function

5. In gradient descent, the update rule is:

\(\mathbf{w} \leftarrow \mathbf{w} + \alpha \nabla L\)
\(\mathbf{w} \leftarrow \mathbf{w} - \alpha \nabla L\)
\(\mathbf{w} \leftarrow \alpha \nabla L\)
\(\mathbf{w} \leftarrow \mathbf{w} / \nabla L\)

Show Answer

The correct answer is B. Gradient descent subtracts the gradient scaled by learning rate \(\alpha\): \(\mathbf{w} \leftarrow \mathbf{w} - \alpha \nabla L\). This moves parameters in the direction that decreases the loss.

Concept Tested: Gradient Descent

6. The learning rate \(\alpha\) controls:

The size of the dataset
The step size in parameter updates
The number of iterations
The model architecture

Show Answer

The correct answer is B. The learning rate determines how large a step to take in the direction of the negative gradient. Too large can cause divergence; too small leads to slow convergence.

Concept Tested: Learning Rate

7. Linear regression finds parameters that minimize:

The number of features
The sum of squared residuals
The number of training samples
The weight magnitudes

Show Answer

The correct answer is B. Linear regression minimizes the sum of squared residuals: \(\sum_i (y_i - \hat{y}_i)^2\), where \(\hat{y}_i\) are predictions. This has a closed-form solution via the normal equations.

Concept Tested: Linear Regression

8. Regularization adds to the loss function:

More training data
A penalty term based on weight magnitudes
Additional features
More layers

Show Answer

The correct answer is B. Regularization adds a penalty term (like \(\lambda\|\mathbf{w}\|^2\) for L2) to prevent overfitting. This encourages smaller weights and simpler models.

Concept Tested: Regularization

9. A hyperplane in classification:

Maximizes the loss function
Separates data points of different classes
Is always vertical
Contains all training points

Show Answer

The correct answer is B. A hyperplane is a linear decision boundary that separates data points belonging to different classes. In \(n\) dimensions, a hyperplane has \(n-1\) dimensions.

Concept Tested: Hyperplane

10. The bias term in a linear model:

Must always be zero
Allows the model to fit data not passing through the origin
Increases overfitting
Is the same as the weight

Show Answer

The correct answer is B. The bias term \(\mathbf{b}\) in \(\mathbf{y} = W\mathbf{x} + \mathbf{b}\) allows the model to represent functions that don't pass through the origin, providing a translation or offset.

Concept Tested: Bias Term