Hyperparameter Tuning for a Simple Regression Model

Machine learning models often have hyperparameters that are not learned from the data but rather set before training. Finding the optimal combination of these hyperparameters can significantly improve model performance. This challenge will guide you through implementing a basic hyperparameter tuning process for a regression model.

Problem Description

Your task is to implement a function that performs hyperparameter tuning for a given machine learning model and dataset. You will be provided with a model class (simulated), a dataset, and a search space for hyperparameters. Your function should systematically explore this search space, train the model with different hyperparameter combinations, evaluate its performance, and return the best set of hyperparameters found.

Key Requirements:

Hyperparameter Search: Implement a strategy to explore the provided hyperparameter search space. For this challenge, we will simulate a simple grid search.
Model Training and Evaluation: For each hyperparameter combination, train the model on the training data and evaluate its performance on the validation data.
Performance Metric: Use Mean Squared Error (MSE) as the primary metric to evaluate the model's performance. Lower MSE indicates better performance.
Return Best Hyperparameters: After exploring all combinations, return the set of hyperparameters that resulted in the lowest MSE.

Expected Behavior:

The function should iterate through all possible combinations of hyperparameters defined in the search space. For each combination:

Instantiate the model with the current hyperparameters.
Train the model using the provided X_train and y_train.
Predict on X_val and calculate the MSE using y_val.
Keep track of the hyperparameter combination that yields the minimum MSE.

Edge Cases:

Empty hyperparameter search space: The function should handle this gracefully, perhaps by returning None or an empty dictionary.
Dataset with no samples: While unlikely in a practical scenario, consider how this might affect calculations. For this challenge, assume valid, non-empty datasets.

Examples

Example 1:

from sklearn.metrics import mean_squared_error

# Simulated Model Class
class SimpleRegressor:
    def __init__(self, learning_rate=0.01, n_estimators=100):
        self.learning_rate = learning_rate
        self.n_estimators = n_estimators
        self.weights = None # Placeholder for trained weights

    def fit(self, X, y):
        # Simulate training: In a real scenario, this would update self.weights
        # For this example, we'll just store a dummy value.
        self.weights = (self.learning_rate, self.n_estimators)
        print(f"Training with lr={self.learning_rate}, n_estimators={self.n_estimators}")

    def predict(self, X):
        # Simulate prediction: In a real scenario, this would use self.weights
        # For this example, we'll return a simple linear combination based on input features
        # and dummy weights derived from hyperparameters.
        if self.weights is None:
            raise ValueError("Model not trained yet.")
        lr, ne = self.weights
        # A simplistic prediction that depends on hyperparameters for demonstration
        return X[:, 0] * (lr * 10) + X[:, 1] * (ne / 50)

# Sample Data
X_train = [[1, 2], [3, 4], [5, 6]]
y_train = [3, 7, 11]
X_val = [[7, 8], [9, 10]]
y_val = [15, 19]

# Hyperparameter Search Space
param_grid = {
    'learning_rate': [0.01, 0.1],
    'n_estimators': [50, 100]
}

# Expected Output Function Call:
# For learning_rate=0.01, n_estimators=50:
#   Model trained. Predict on X_val:
#     [7, 8] -> 7 * (0.01 * 10) + 8 / 50 = 0.7 + 0.16 = 0.86
#     [9, 10] -> 9 * (0.01 * 10) + 10 / 50 = 0.9 + 0.2 = 1.1
#   Predictions: [0.86, 1.1]. y_val: [15, 19]. MSE is high.
#
# For learning_rate=0.01, n_estimators=100:
#   Model trained. Predict on X_val:
#     [7, 8] -> 7 * (0.01 * 10) + 8 / 100 = 0.7 + 0.08 = 0.78
#     [9, 10] -> 9 * (0.01 * 10) + 10 / 100 = 0.9 + 0.1 = 1.0
#   Predictions: [0.78, 1.0]. y_val: [15, 19]. MSE is high.
#
# For learning_rate=0.1, n_estimators=50:
#   Model trained. Predict on X_val:
#     [7, 8] -> 7 * (0.1 * 10) + 8 / 50 = 7 + 0.16 = 7.16
#     [9, 10] -> 9 * (0.1 * 10) + 10 / 50 = 9 + 0.2 = 9.2
#   Predictions: [7.16, 9.2]. y_val: [15, 19]. MSE is lower.
#
# For learning_rate=0.1, n_estimators=100:
#   Model trained. Predict on X_val:
#     [7, 8] -> 7 * (0.1 * 10) + 8 / 100 = 7 + 0.08 = 7.08
#     [9, 10] -> 9 * (0.1 * 10) + 10 / 100 = 9 + 0.1 = 9.1
#   Predictions: [7.08, 9.1]. y_val: [15, 19]. MSE is lowest.

# Expected Output:
# {'learning_rate': 0.1, 'n_estimators': 100}

Example 2:

from sklearn.metrics import mean_squared_error

# Simulated Model Class
class AnotherRegressor:
    def __init__(self, alpha=0.5, beta=1.0):
        self.alpha = alpha
        self.beta = beta
        self.coefficient = None

    def fit(self, X, y):
        # Simulate training
        self.coefficient = self.alpha * 10 + self.beta * 5
        print(f"Training with alpha={self.alpha}, beta={self.beta}")

    def predict(self, X):
        if self.coefficient is None:
            raise ValueError("Model not trained yet.")
        # A simplistic prediction
        return X[:, 0] * self.coefficient

# Sample Data
X_train = [[10, 20], [30, 40]]
y_train = [100, 200]
X_val = [[50, 60], [70, 80]]
y_val = [300, 400]

# Hyperparameter Search Space
param_grid = {
    'alpha': [0.2, 0.5, 0.8],
    'beta': [0.5, 1.0]
}

# Expected Output Function Call Analysis:
# Let's trace one combination: alpha=0.5, beta=1.0
#   coefficient = 0.5 * 10 + 1.0 * 5 = 5 + 5 = 10
#   Predict on X_val:
#     [50, 60] -> 50 * 10 = 500
#     [70, 80] -> 70 * 10 = 700
#   Predictions: [500, 700]. y_val: [300, 400]. MSE is high.
#
# Let's trace another: alpha=0.8, beta=0.5
#   coefficient = 0.8 * 10 + 0.5 * 5 = 8 + 2.5 = 10.5
#   Predict on X_val:
#     [50, 60] -> 50 * 10.5 = 525
#     [70, 80] -> 70 * 10.5 = 735
#   Predictions: [525, 735]. y_val: [300, 400]. MSE is still high.
#
# The goal is to find parameters that make predictions closer to y_val.
# In this simplified model and data, the 'true' underlying relationship for y_val is roughly
# y = x[:,0] * 6. We need to find alpha and beta such that alpha*10 + beta*5 is close to 6.
#
# Consider alpha=0.2, beta=1.0:
#   coefficient = 0.2 * 10 + 1.0 * 5 = 2 + 5 = 7
#   Predictions: [350, 490]. y_val: [300, 400]. MSE is reducing.
#
# Consider alpha=0.2, beta=0.5:
#   coefficient = 0.2 * 10 + 0.5 * 5 = 2 + 2.5 = 4.5
#   Predictions: [225, 315]. y_val: [300, 400]. MSE is reducing.
#
# Consider alpha=0.5, beta=0.5:
#   coefficient = 0.5 * 10 + 0.5 * 5 = 5 + 2.5 = 7.5
#   Predictions: [375, 525]. y_val: [300, 400].
#
# After calculating MSE for all 6 combinations, the one that results in the lowest MSE is desired.
# For this specific data and model, alpha=0.2, beta=0.5 leads to predictions closer to y_val.

# Expected Output:
# {'alpha': 0.2, 'beta': 0.5}

Constraints

The model_class will be a Python class that accepts hyperparameters as keyword arguments in its __init__ method.
The model_class will have fit(X, y) and predict(X) methods.
X_train, y_train, X_val, y_val will be lists of lists or NumPy arrays.
The param_grid will be a dictionary where keys are hyperparameter names (strings) and values are lists of possible values for that hyperparameter.
The number of hyperparameters will be at least 1 and at most 5.
The number of values for each hyperparameter will be between 1 and 10.
The total number of hyperparameter combinations to explore will not exceed 100.

Notes

You will need to import mean_squared_error from sklearn.metrics.
You can use itertools.product to generate all combinations of hyperparameters from the param_grid.
Remember to handle potential ValueError if the model is not trained before prediction (though this shouldn't happen with a correct implementation).
The simulated models in the examples are highly simplified. Your tuning function should work with any valid model class conforming to the specified interface.
The goal is to find the best performing hyperparameters, so ensure your tracking of minimum MSE and corresponding parameters is accurate.