R script for RandomForest with Cross-validation and Sampling
Sat Oct 18, 2014 5:28 am
This is script is used for building a randomForest Classifier with 10-cross-validation in R. This scripts also deals with the unbalanced data problem by doing up-sampling and down-sampling steps on the training data. This script also calculate the precision/recall, variable importance and ROC curve area for each fold. To run this script you need to do few modification to read and process your data.
trainingset <- subset(data, id %in% list[-i]) # Performing upsampling of minorities using ROSE package #trainingset <- ROSE(class~., data=trainingset, N=length(trainingset$class))$data
# Note that the sizes of the arrays here are based on your data. So you may need to change it!! trainingset=downSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class") trainingset=upSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class") #print(trainingset[,23]) testset <- subset(data, id %in% c(i))