Build Linear Regression in Python - Supervised Learning

DrRakha

This example is a good one to start learning applying machine learning in python. If you are new to python and machine learning this example will guide you through simple steps to run your first Supervised Learning model. As a dataset, we use the publicly available Diabetes dataset in sklearn library. The Diabetes dataset has records for 442 patients and 10 features. The features are Age, Gender, BMI, Blood Pressure, 6x Blood Serum Measurements. For simplicity, we pick the second feature which is the Gender. The target class is a continuous value for Diabetes Disease. The trained model is validated by splitting the dataset into training and testing. The linear regression model try to find a linear relationship between the feature X and target class Y. The linear equation is Y=aX+c where 'a' is the coefficient and 'c' is the intersection. We train the model using the training split and then measure the model performance using the testing split.

Code:

#Demo1
#M. S. Rakha, Ph.D.
#Post-Doctoral - Queen's University 
#Supervised Learning - LinearRegression
%matplotlib inline
print(__doc__)



import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

# Load the diabetes dataset
diabetes = datasets.load_diabetes()
diabetesDF=pd.DataFrame(diabetes.data)


# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

# The coefficients
print('Coefficients: \n',regr.intercept_)
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
      % mean_squared_error(diabetes_y_test, diabetes_y_pred))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % r2_score(diabetes_y_test, diabetes_y_pred))

# Plot outputs
plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')
plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())

plt.show()

This code uses the scikit-learn library to perform linear regression on a diabetes dataset. The dataset is loaded, and a single feature (column 2) is selected. The data is then split into training and testing sets. A linear regression object is created and then fit into the training data. The model is then used to make predictions on the test data. The code then calculates and prints out the model's coefficients, the mean squared error of the predictions, and the explained variance score. The code also produces a scatter plot of the test data with the predictions plotted on top of it. The plot shows the relationship between the selected feature and the target variable and how well the linear regression model fits the data.

Linear regression is a good choice for this code because it is a simple and widely used method for modeling the relationship between a dependent variable and one or more independent variables. In this case, the dependent variable is diabetes.target and the independent variable is diabetes.data[:, np.newaxis, 2]. Linear regression can be used to estimate the relationship between these two variables and make predictions about diabetes.target based on new values of diabetes.data[:, np.newaxis, 2]. The mean squared error and the variance score are used to evaluate the model's performance. The model's output is then plotted to visualize the relationship between the independent and dependent variables. Overall, linear regression is a suitable method for this problem because it is easy to interpret and understand, and it can provide a good fit for linear relationships between variables.

Below is the output of running the script on Jupyter notebook:

Code:

Automatically created module for IPython interactive environment
Coefficients: 
 152.91886182616167
Coefficients: 
 [938.23786125]
Mean squared error: 2548.07
Variance score: 0.47

Java

C/C++

PHP

C#

HTML

CSS

ASP

Javascript

JQuery

AJAX

XSD

Python

Matlab

R Scripts

Weka

Build Linear Regression in Python - Supervised Learning

Topic Tags