Quiz: Neural Networks and Deep Learning

Test your understanding of neural network architecture and deep learning concepts.

1. A fully connected (dense) layer computes:

Element-wise multiplication only
\(\mathbf{y} = \sigma(W\mathbf{x} + \mathbf{b})\) with linear transform and activation
Convolution of input with a kernel
Max pooling of input values

Show Answer

The correct answer is B. A dense layer performs a linear transformation followed by a nonlinear activation: \(\mathbf{y} = \sigma(W\mathbf{x} + \mathbf{b})\), where \(W\) is the weight matrix, \(\mathbf{b}\) is the bias, and \(\sigma\) is the activation function.

Concept Tested: Dense Layer

2. The purpose of an activation function is to:

Initialize weights randomly
Introduce nonlinearity into the network
Reduce the number of parameters
Normalize input data

Show Answer

The correct answer is B. Activation functions introduce nonlinearity, enabling neural networks to learn complex patterns. Without them, stacking layers would just produce another linear transformation.

Concept Tested: Activation Function

3. The ReLU activation function is defined as:

\(\sigma(x) = \frac{1}{1+e^{-x}}\)
\(\sigma(x) = \max(0, x)\)
\(\sigma(x) = \tanh(x)\)
\(\sigma(x) = x^2\)

Show Answer

The correct answer is B. ReLU (Rectified Linear Unit) is defined as \(\max(0, x)\)—it outputs \(x\) for positive inputs and 0 for negative inputs. It's computationally efficient and helps mitigate vanishing gradients.

Concept Tested: ReLU Activation

4. Backpropagation computes:

The forward pass predictions
Gradients of the loss with respect to all parameters
The optimal learning rate
The activation function values

Show Answer

The correct answer is B. Backpropagation efficiently computes gradients of the loss with respect to all network parameters using the chain rule, enabling gradient-based optimization.

Concept Tested: Backpropagation

5. The softmax function:

Sets all outputs to zero
Converts scores to probabilities that sum to 1
Applies ReLU to each element
Computes the maximum value

Show Answer

The correct answer is B. Softmax converts a vector of real-valued scores to a probability distribution: \(\text{softmax}(\mathbf{z})_i = \frac{e^{z_i}}{\sum_j e^{z_j}}\). All outputs are positive and sum to 1.

Concept Tested: Softmax Function

6. A tensor in deep learning is:

Always a 2D matrix
A multidimensional array of numbers
A type of activation function
A loss function

Show Answer

The correct answer is B. A tensor is a multidimensional array—a generalization of scalars (0D), vectors (1D), and matrices (2D) to higher dimensions. Tensors represent data and parameters in neural networks.

Concept Tested: Tensor

7. Cross-entropy loss is commonly used for:

Regression problems
Classification problems with probability outputs
Dimensionality reduction
Feature extraction

Show Answer

The correct answer is B. Cross-entropy loss measures the difference between predicted probability distributions and true labels. It's the standard loss for classification, especially with softmax outputs.

Concept Tested: Cross-Entropy Loss

8. Batch normalization:

Increases batch size automatically
Normalizes layer inputs to have zero mean and unit variance
Removes all biases from the network
Converts images to grayscale

Show Answer

The correct answer is B. Batch normalization normalizes activations across a mini-batch to zero mean and unit variance, then applies learned scale and shift. This stabilizes training and can act as regularization.

Concept Tested: Batch Normalization

9. The chain rule in backpropagation enables:

Computing gradients through composed functions
Increasing network depth automatically
Selecting the best activation function
Determining batch size

Show Answer

The correct answer is A. The chain rule allows computing gradients through composed functions: \(\frac{\partial L}{\partial x} = \frac{\partial L}{\partial y} \cdot \frac{\partial y}{\partial x}\). This is essential for propagating gradients through all network layers.

Concept Tested: Chain Rule

10. Dropout during training:

Removes entire layers permanently
Randomly sets some neuron outputs to zero
Doubles the learning rate
Freezes all weights

Show Answer

The correct answer is B. Dropout randomly sets a fraction of neuron outputs to zero during training, preventing co-adaptation of neurons. This serves as regularization and improves generalization.

Concept Tested: Dropout