Build Linear Regression in Python - Supervised Learning

Fri Oct 25, 2019 6:36 am

This example is a good one to start learning applying machine learning in python. If you are new to python and machine learning this example will guide you through simple steps to run your first Supervised Learning model. As a dataset, we use the publicly available Diabetes dataset in sklearn library. The Diabetes dataset has records for 442 patients and 10 features. The features are Age, Gender, BMI, Blood Pressure, 6x Blood Serum Measurements. For simplicity, we pick the second feature which is the Gender. The target class is a continuous value for Diabetes Disease. The trained model is validated by splitting the dataset into training and testing. The linear regression model try to find a linear relationship between the feature X and target class Y. The linear equation is Y=aX+c where 'a' is the coefficient and 'c' is the intersection. We train the model using the training split and then measure the model performance using the testing split.

Code:: #Demo1 #M. S. Rakha, Ph.D. #Post-Doctoral - Queen's University #Supervised Learning - LinearRegression %matplotlib inline print(__doc__) import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model from sklearn.metrics import mean_squared_error, r2_score import pandas as pd # Load the diabetes dataset diabetes = datasets.load_diabetes() diabetesDF=pd.DataFrame(diabetes.data) # Use only one feature diabetes_X = diabetes.data[:, np.newaxis, 2] # Split the data into training/testing sets diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] # Split the targets into training/testing sets diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(diabetes_X_train, diabetes_y_train) # Make predictions using the testing set diabetes_y_pred = regr.predict(diabetes_X_test) # The coefficients print('Coefficients: \n',regr.intercept_) print('Coefficients: \n', regr.coef_) # The mean squared error print("Mean squared error: %.2f" % mean_squared_error(diabetes_y_test, diabetes_y_pred)) # Explained variance score: 1 is perfect prediction print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred)) # Plot outputs plt.scatter(diabetes_X_test, diabetes_y_test, color='black') plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3) plt.xticks(()) plt.yticks(()) plt.show()

This code uses the scikit-learn library to perform linear regression on a diabetes dataset. The dataset is loaded, and a single feature (column 2) is selected. The data is then split into training and testing sets. A linear regression object is created and then fit into the training data. The model is then used to make predictions on the test data. The code then calculates and prints out the model's coefficients, the mean squared error of the predictions, and the explained variance score. The code also produces a scatter plot of the test data with the predictions plotted on top of it. The plot shows the relationship between the selected feature and the target variable and how well the linear regression model fits the data.

Linear regression is a good choice for this code because it is a simple and widely used method for modeling the relationship between a dependent variable and one or more independent variables. In this case, the dependent variable is diabetes.target and the independent variable is diabetes.data[:, np.newaxis, 2]. Linear regression can be used to estimate the relationship between these two variables and make predictions about diabetes.target based on new values of diabetes.data[:, np.newaxis, 2]. The mean squared error and the variance score are used to evaluate the model's performance. The model's output is then plotted to visualize the relationship between the independent and dependent variables. Overall, linear regression is a suitable method for this problem because it is easy to interpret and understand, and it can provide a good fit for linear relationships between variables.

Below is the output of running the script on Jupyter notebook:

Code:: Automatically created module for IPython interactive environment Coefficients: 152.91886182616167 Coefficients: [938.23786125] Mean squared error: 2548.07 Variance score: 0.47

Attachments

: LinearResults.png (13.67 KiB) Viewed 5119 times

Build Linear Regression in Python - Supervised Learning

Build Linear Regression in Python - Supervised Learning

Topic Tags