Hone logo
Hone
Problems

Simple Linear Regression Model Training

This challenge focuses on building a basic linear regression model in Python using the scikit-learn library. Linear regression is a fundamental machine learning algorithm used to model the relationship between a dependent variable and one or more independent variables. This exercise will help you understand the core steps involved in training a model: data preparation, model instantiation, fitting the model to data, and making predictions.

Problem Description

You are tasked with creating a Python script that trains a linear regression model on a given dataset. The dataset consists of two columns: 'feature' (independent variable) and 'target' (dependent variable). Your script should:

  1. Generate Synthetic Data: Create a synthetic dataset with 100 data points. The 'feature' values should be randomly generated between 0 and 10. The 'target' values should be a linear function of the 'feature' plus some random noise. The linear function should be target = 2 * feature + 1 + noise, where noise is a random number sampled from a normal distribution with mean 0 and standard deviation 1.
  2. Prepare the Data: Convert the generated data into NumPy arrays for use with scikit-learn.
  3. Instantiate the Model: Create a LinearRegression object from the sklearn.linear_model module.
  4. Train the Model: Fit the linear regression model to the prepared data using the fit() method.
  5. Make Predictions: Generate predictions for a new set of 'feature' values (e.g., [0, 5, 10]).
  6. Print Results: Print the learned coefficients (slope and intercept) of the linear regression model and the predictions made on the new feature values.

Examples

Example 1:

Input: No direct input, the script generates the data.
Output:
Coefficients: [2.0, 1.0]  (approximately, due to noise)
Predictions: [1.0, 11.0, 21.0] (approximately, due to noise)
Explanation: The script generates data, trains a linear regression model, and prints the coefficients and predictions. The coefficients should be close to 2.0 (slope) and 1.0 (intercept) based on the data generation formula. Predictions are based on the learned model.

Example 2:

Input: No direct input, the script generates the data.
Output:
Coefficients: [2.1, 0.9] (approximately, due to noise)
Predictions: [0.9, 11.1, 21.2] (approximately, due to noise)
Explanation: Similar to Example 1, but the coefficients and predictions may vary slightly due to the random noise introduced during data generation.

Constraints

  • The dataset must contain 100 data points.
  • The 'feature' values must be randomly generated between 0 and 10.
  • The 'target' values must be generated using the formula target = 2 * feature + 1 + noise, where noise is sampled from a normal distribution with mean 0 and standard deviation 1.
  • The script must use the sklearn.linear_model.LinearRegression class.
  • The script must print the learned coefficients and the predictions for the new feature values.
  • The script should be executable and produce the expected output without errors.

Notes

  • You'll need to import necessary libraries like numpy and sklearn.
  • Consider using numpy.random.normal to generate the random noise.
  • The exact coefficients and predictions will vary slightly due to the random noise. Focus on the general trend and the correct implementation of the linear regression process.
  • Think about how to structure your code for clarity and readability. Separate data generation, model training, and prediction into distinct steps.
Loading editor...
python