Building a Simple Neural Network from Scratch

This challenge requires you to implement a fundamental building block of modern AI: a neural network. You will construct a basic feedforward neural network from scratch in Python, learning about the core concepts of forward propagation, activation functions, and backpropagation for training. This exercise is crucial for understanding how machine learning models learn and make predictions.

Problem Description

Your task is to build a single-layer feedforward neural network that can perform binary classification. The network should consist of an input layer, a single hidden layer, and an output layer. You will need to implement the following components:

Forward Propagation: Calculate the output of the network given an input and current weights/biases.
Activation Function: Implement a sigmoid activation function.
Loss Function: Implement the binary cross-entropy loss function.
Backpropagation: Calculate the gradients of the loss with respect to weights and biases.
Weight/Bias Update: Update the weights and biases using gradient descent.

The neural network will be trained on a simple dataset for binary classification. You will need to train the network for a specified number of epochs and then evaluate its performance.

Key Requirements:

Class Structure: Create a NeuralNetwork class to encapsulate the network's structure and functionality.
Initialization: The __init__ method should take the number of input features, hidden layer size, and output size (which will be 1 for binary classification) as arguments. It should initialize weights and biases with small random values.
Sigmoid Activation: Implement a sigmoid(x) function that returns 1 / (1 + exp(-x)).
Forward Pass: Implement a forward(X) method that takes the input data X (a NumPy array) and returns the predicted output (a NumPy array). This involves calculating weighted sums and applying the sigmoid activation.
Loss Calculation: Implement a compute_loss(y_true, y_pred) method that calculates the binary cross-entropy loss.
Backward Pass (Backpropagation): Implement a backward(X, y_true, y_pred) method that calculates and returns the gradients for weights and biases.
Training: Implement a train(X, y, epochs, learning_rate) method that iterates for a specified number of epochs, performing forward pass, loss computation, backward pass, and weight/bias updates.
Prediction: Implement a predict(X) method that uses the trained network to make predictions on new data. For binary classification, predictions should be thresholded at 0.5.

Expected Behavior:

The train method should iteratively adjust the network's weights and biases to minimize the loss function. After training, the predict method should output binary predictions (0 or 1) that are reasonably accurate for the given training data.

Edge Cases:

Input Data Shape: Ensure your code handles input data with a single sample and multiple samples correctly.
Zero Loss: While unlikely in practice with real data, consider how your loss function behaves with perfect predictions.

Examples

Example 1: XOR Gate Simulation

The XOR gate is a classic problem that requires a non-linear decision boundary, making it suitable for a neural network.

Input Data (X):
[[0, 0],
 [0, 1],
 [1, 0],
 [1, 1]]

Target Labels (y):
[0, 1, 1, 0]

Training Parameters:
epochs = 10000
learning_rate = 0.1
hidden_layer_size = 4

Output (after training and prediction on X):
[0, 1, 1, 0]

Explanation: The neural network, after sufficient training, should learn to approximate the XOR function. When presented with the XOR inputs, it should output predictions close to the target labels. The predict function will then threshold these outputs to 0 or 1.

Example 2: Small Dataset Classification

A slightly larger, linearly separable dataset.

Input Data (X):
[[1.0, 2.0],
 [2.0, 3.0],
 [3.0, 1.0],
 [4.0, 3.0],
 [5.0, 2.0]]

Target Labels (y):
[0, 0, 1, 1, 1]

Training Parameters:
epochs = 5000
learning_rate = 0.05
hidden_layer_size = 3

Output (after training and prediction on X):
[0, 0, 1, 1, 1]

Explanation: The network should learn to distinguish between the two classes based on the input features. The predictions should match the provided target labels for this small, relatively simple dataset.

Constraints

Libraries: You are allowed to use numpy for numerical operations. Do not use any deep learning frameworks like TensorFlow or PyTorch.
Input Data: Input X will be a NumPy array of shape (n_samples, n_features). Target y will be a NumPy array of shape (n_samples, 1).
Output Data: Predictions should be a NumPy array of shape (n_samples, 1).
Performance: For the given examples, the training should complete within a reasonable time (e.g., a few seconds to a minute) on a standard machine.

Notes

Initialization: Small random initializations for weights are crucial to break symmetry. A common practice is to initialize weights from a standard normal distribution scaled by a small factor (e.g., 0.01 * np.random.randn(...)). Biases can be initialized to zeros.
Learning Rate: The learning_rate controls the step size during gradient descent. Experiment with different values if the network is not converging.
Epochs: The number of epochs determines how many times the training algorithm will iterate over the entire dataset. More epochs generally lead to better convergence but can also lead to overfitting if not careful.
Numerical Stability: Be mindful of potential numerical issues with np.exp() when dealing with very large or very small numbers. The sigmoid function is generally stable.
Gradient Descent: You'll be implementing basic batch gradient descent, where gradients are computed over the entire batch of training data.