This is a Supervised Learning using the random forest. The distinctive part of this example in contrast to the previous one (Random Forest Example) is the split of data. In this example, we apply more extensive validation of the model using the KFold Cross-validation. In this validation approach, the model is trained and tested K-times with different training and testing data for each time. In the end, we will have 5 results for 5 models. To report the results, the average or the median of performance measures is usually is selected to represent the outcome of this experiment. The main benefit of KFold Cross-validation is to reduce the chances of overfitting. Generally, the model results from the KFold validation is trained on different combinations of the data. Hence, it has lower chances to be overfitted to a particular training set.
#Demo4 #M. S. Rakha, Ph.D. # Post-Doctoral - Queen's University # Supervised Learning - RandomForest Classification # RandomForest Classification_Kfold %matplotlib inline import numpy as np import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D import pandas as pd from sklearn.cluster import KMeans from sklearn import datasets from sklearn.preprocessing import scale import sklearn.metrics as sm from sklearn.metrics import confusion_matrix,classification_report from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report
#Applying the training and testing of the KFold for train_index, test_index in kf.split(X): print("Train:", train_index, "Validation:",test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index] randomForestModel.fit(X_train, y_train) y_pred = randomForestModel.predict(X_test) print(classification_report(y_test, y_pred))
The out of this code are indexes of the training set, indexes of the testing set (also called validation), and accuracy measurements for each fold of the 5.