Implementing Backpropagation for a Simple Neural Network

This challenge asks you to implement the backpropagation algorithm from scratch in Python. Backpropagation is the cornerstone of training most modern neural networks, allowing them to learn by iteratively adjusting their weights and biases based on the error of their predictions. Successfully implementing this will deepen your understanding of how neural networks learn.

Problem Description

You will create a Python class that represents a simple feedforward neural network and implements the backpropagation algorithm for training. The network will have a single hidden layer.

Requirements:

Network Structure: The network should be initialized with a specified number of input neurons, hidden neurons, and output neurons.
Initialization: Weights and biases should be initialized randomly. A common practice is to use small random values (e.g., from a standard normal distribution scaled by a small factor).
Forward Pass: Implement a method to perform a forward pass through the network, taking input data and producing an output prediction. This involves:
- Calculating weighted sums for each neuron in the hidden and output layers.
- Applying an activation function (e.g., Sigmoid) to the hidden layer outputs.
- Applying an activation function (e.g., Sigmoid) to the output layer outputs.
Loss Function: Use the Mean Squared Error (MSE) as the loss function: $MSE = \frac{1}{2} \sum (y_{true} - y_{pred})^2$. The factor of $\frac{1}{2}$ simplifies the derivative.
Backpropagation: Implement the core backpropagation logic to calculate gradients for weights and biases. This involves:
- Calculating the error at the output layer.
- Propagating this error back to the hidden layer.
- Computing the gradients for the output layer weights and biases.
- Computing the gradients for the hidden layer weights and biases.
Weight Update: Implement a method to update the network's weights and biases using the calculated gradients and a given learning rate. The update rule for a weight $W$ would be: $W_{new} = W_{old} - \text{learning_rate} \times \frac{\partial Loss}{\partial W}$.
Training: Create a training method that iterates through the training data for a specified number of epochs, performing a forward pass, calculating the loss, and then performing a backward pass to update the weights.

Activation Function: Use the Sigmoid function: $sigmoid(x) = \frac{1}{1 + e^{-x}}$. Derivative of Sigmoid: The derivative of the Sigmoid function is $sigmoid'(x) = sigmoid(x) * (1 - sigmoid(x))$.

Expected Behavior:

The __init__ method should set up the network structure and initialize parameters.
The forward method should return the network's prediction for given input.
The backward method should calculate and store the gradients for all weights and biases.
The update_weights method should modify weights and biases based on gradients.
The train method should orchestrate the training process.

Edge Cases:

Input/Output Shape Mismatch: Ensure that the dimensions of input data, target outputs, and network layers are compatible.
Learning Rate: A very high learning rate can cause divergence, while a very low one can lead to slow convergence.

Examples

Example 1: Simple XOR problem training

This example demonstrates training the network to learn the XOR function.

import numpy as np

class SimpleNeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size

        # Initialize weights and biases
        # Weights for input to hidden layer
        self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size) * 0.1
        self.bias_hidden = np.zeros((1, self.hidden_size))

        # Weights for hidden to output layer
        self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size) * 0.1
        self.bias_output = np.zeros((1, self.output_size))

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def sigmoid_derivative(self, x):
        return x * (1 - x) # x here is already the sigmoid output

    def forward(self, X):
        # Input to Hidden Layer
        self.hidden_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
        self.hidden_output = self.sigmoid(self.hidden_input)

        # Hidden to Output Layer
        self.output_input = np.dot(self.hidden_output, self.weights_hidden_output) + self.bias_output
        self.predicted_output = self.sigmoid(self.output_input)

        return self.predicted_output

    def mse_loss(self, y_true, y_pred):
        return 0.5 * np.mean(np.square(y_true - y_pred))

    def backward(self, X, y_true, learning_rate):
        # Calculate error at the output layer
        output_error = y_true - self.predicted_output
        output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

        # Calculate error at the hidden layer
        hidden_error = output_delta.dot(self.weights_hidden_output.T)
        hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

        # Calculate gradients for weights and biases
        # Output layer
        d_weights_hidden_output = self.hidden_output.T.dot(output_delta)
        d_bias_output = np.sum(output_delta, axis=0, keepdims=True)

        # Hidden layer
        d_weights_input_hidden = X.T.dot(hidden_delta)
        d_bias_hidden = np.sum(hidden_delta, axis=0, keepdims=True)

        # Update weights and biases
        self.weights_hidden_output += learning_rate * d_weights_hidden_output
        self.bias_output += learning_rate * d_bias_output
        self.weights_input_hidden += learning_rate * d_weights_input_hidden
        self.bias_hidden += learning_rate * d_bias_hidden

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward pass
            predictions = self.forward(X)
            loss = self.mse_loss(y, predictions)

            # Backward pass and weight update
            self.backward(X, y, learning_rate)

            if (epoch + 1) % 1000 == 0:
                print(f'Epoch {epoch+1}/{epochs}, Loss: {loss:.4f}')

# --- Training Data for XOR ---
X_train = np.array([[0,0], [0,1], [1,0], [1,1]])
y_train = np.array([[0], [1], [1], [0]])

# --- Network Initialization ---
input_size = 2
hidden_size = 4 # Example hidden layer size
output_size = 1
learning_rate = 0.1
epochs = 10000

nn = SimpleNeuralNetwork(input_size, hidden_size, output_size)

# --- Train the network ---
print("Starting training...")
nn.train(X_train, y_train, epochs, learning_rate)
print("Training finished.")

# --- Test the trained network ---
print("\nTesting the trained network:")
for input_data in X_train:
    prediction = nn.forward(input_data)
    print(f"Input: {input_data}, Predicted Output: {prediction[0]:.4f}")

Input: The code above defines the SimpleNeuralNetwork class and sets up training data for XOR. It initializes a network with 2 input neurons, 4 hidden neurons, and 1 output neuron. Then it trains the network for 10000 epochs with a learning rate of 0.1. Finally, it tests the trained network on the training data.

Output: The output will show the training progress with loss values decreasing over epochs. After training, it will print the predicted outputs for each XOR input. The predicted outputs should be close to the true XOR outputs (0 for [0,0], 1 for [0,1], 1 for [1,0], 0 for [1,1]).

Example of expected output (values may vary slightly due to random initialization and training process):

Starting training...
Epoch 1000/10000, Loss: 0.0217
Epoch 2000/10000, Loss: 0.0144
Epoch 3000/10000, Loss: 0.0110
Epoch 4000/10000, Loss: 0.0090
Epoch 5000/10000, Loss: 0.0077
Epoch 6000/10000, Loss: 0.0067
Epoch 7000/10000, Loss: 0.0060
Epoch 8000/10000, Loss: 0.0054
Epoch 9000/10000, Loss: 0.0050
Epoch 10000/10000, Loss: 0.0046
Training finished.

Testing the trained network:
Input: [0 0], Predicted Output: 0.0432
Input: [0 1], Predicted Output: 0.9618
Input: [1 0], Predicted Output: 0.9579
Input: [1 1], Predicted Output: 0.0457

Explanation: The network learns to approximate the XOR function. The forward pass computes an output. The backward pass calculates how much each weight and bias contributed to the error. These contributions (gradients) are then used to adjust the weights and biases to reduce the error in future predictions. The training loop repeats this process until the network's performance (measured by loss) is satisfactory.

Constraints

Input data and target outputs will be NumPy arrays.
The number of input features must match the input_size of the network.
The number of output neurons must match the output_size of the network.
The network will have one hidden layer.
The Sigmoid activation function and MSE loss function must be used.
The implementation should be in pure Python, leveraging NumPy for numerical operations.
For the provided example, the training should complete within a reasonable time (e.g., a few seconds to a minute) on a standard CPU.

Notes

Remember that the derivative of the Sigmoid function is often computed using its output value, i.e., sigmoid'(sigmoid(x)) = sigmoid(x) * (1 - sigmoid(x)). This is an optimization that can be used if you have already computed the sigmoid output.
The scaling factor for weight initialization (e.g., * 0.1) is important to prevent exploding or vanishing gradients, especially in deeper networks.
Consider how you will store intermediate values during the forward pass, as they are needed for the backward pass.
The backward method in the example implicitly updates the weights. You might choose to separate gradient calculation from weight update for more flexibility. However, for this challenge, combining them is acceptable if it simplifies your implementation.
The provided backward method in the example code directly updates weights. A more modular approach would be to have backward return the gradients and then have a separate update_weights method use them. For this challenge, the provided structure is fine.