Total members 11642 |It is currently Thu Jun 22, 2017 2:01 am Login / Join Codemiles

Java

C/C++

PHP

C#

HTML

CSS

ASP

Javascript

JQuery

AJAX

XSD

Python

Matlab

R Scripts

Weka





This is script is used for building a randomForest Classifier with 10-cross-validation in R. This scripts also deals with the unbalanced data problem by doing up-sampling and down-sampling steps on the training data. This script also calculate the precision/recall, variable importance and ROC curve area for each fold. To run this script you need to do few modification to read and process your data.
Code:

setwd("D:/newFolder/")


data <- read.csv("Data.csv",head=TRUE )
require(randomForest)
require(ROSE)
if(!require(caret)){
  library(caret) 
}
if(!require(pROC)){
  library(pROC)
}
library (ROCR);





k=10 #Folds

id <- sample(1:k,nrow(data),replace=TRUE)
list <- 1:k

prediction <- data.frame()
trainingset <- data.frame()
testsetCopy <- data.frame()
#Creating a progress bar to know the status of CV
#progress.bar <- create_progress_bar("text")
#progress.bar$init(k)

PrecisionClassOne=0;
RecallClassOne=0;
PrecisionClassTwo=0;
RecallClassTwo=0;


for (i in 1:k){
 
  trainingset <- subset(data, id %in% list[-i])
  # Performing upsampling of minorities using ROSE package
   #trainingset <- ROSE(class~., data=trainingset,    N=length(trainingset$class))$data
 
   # Note that the sizes of the arrays here are based on your data. So you may need to change it!!
  trainingset=downSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class")
  trainingset=upSample(trainingset[,1:22],as.factor( trainingset[,23]), list = FALSE, yname = "class")
  #print(trainingset[,23])
  testset <- subset(data, id %in% c(i))
 

  #which(sapply(testset,  class) != sapply(trainingset,  class))
   
 
 
  library(party)
  cf1 <- cforest(class~.,data=trainingset,control=cforest_unbiased(mtry=2,ntree=100))
 
  print("perform predictions on test data...")
 
   
   
 
  predictions <- predict(cf1, newdata=testset)

 
 
  metrics<- confusionMatrix(predictions,testset$class,positive='1')
  ClassOne=metrics$byClass
 
 
  metrics2<- confusionMatrix(predictions,testset$class,positive='2')
 
 
 
  ClassTwo=metrics2$byClass;
 
 
 
  PrecisionClassOne=ClassOne[3]+PrecisionClassOne;
  RecallClassOne=ClassOne[1]+RecallClassOne;
 
  PrecisionClassTwo=ClassTwo[3]+PrecisionClassTwo;
  RecallClassTwo=ClassTwo[1]+RecallClassTwo;
 
 
  rocValue=roc.curve(testset$class, predictions,
                     main="ROC curve \n (Half circle depleted data)")
   
  importToSave=varImp(cf1)
  #varImp(model2,conditional=TRUE)
 
  #plot(varImp(model2), top = 20)
 
  if(i>1)
  {
    saveTemp= cbind(saveTemp,importToSave)
    saveROCtemp= rbind(saveROCtemp ,rocValue$auc)
  }else
  {
    saveTemp= importToSave;
    saveROCtemp=rocValue$auc;
  }
 
 
}
PrecisionClassOne=PrecisionClassOne/k;
RecallClassOne=RecallClassOne/k;
PrecisionClassTwo=PrecisionClassTwo/k;
RecallClassTwo=RecallClassTwo/k;
print("Class One Precision/ Recall");
print(PrecisionClassOne);
print(RecallClassOne);
print("Class Two(Re-open) Precision/ Recall");
print(PrecisionClassTwo);
print(RecallClassTwo);


### Saving the importance variables .

write.table ( saveTemp,
              file = "CondsaveTemptable.csv",
              append = FALSE,
              quote = TRUE,
              sep = ",",
              col.names = TRUE,
              row.names = TRUE);

meansOfCOlS=rowMeans(saveTemp)
max(saveTemp)
min(meansOfCOlS)
write.table (meansOfCOlS,
             file = "CondsaveTemptableMeans.csv",
             append = FALSE,
             quote = TRUE,
             sep = ",",
             col.names = TRUE,
             row.names = TRUE);



### Saving the RCOC variables .

write.table ( saveROCtemp,
              file = "CondsaveROCTemptable.csv",
              append = FALSE,
              quote = TRUE,
              sep = ",",
              col.names = TRUE,
              row.names = TRUE);



newSaveTemp<-t(saveTemp)
melt(newSaveTemp)
b <- ggplot(saveTemp, aes(x = saveTemp, ymin = `0%`, lower = `25%`, middle = `50%`, upper = `75%`, ymax = `100%`))
b + geom_boxplot(stat = "identity")






_________________
Sami
PHD student - SAIL - School Of Computing
Queens' University
Canada


Author:
Site Admin
User avatar Posts: 33
Have thanks: 1 time

updated

_________________
Sami
PHD student - SAIL - School Of Computing
Queens' University
Canada


Author:
Site Admin
User avatar Posts: 33
Have thanks: 1 time
Post new topic Reply to topic  [ 2 posts ] 

  Related Posts  to : R script for RandomForest with Cross-validation and Sampling
 Weka java code for Random Forest Cross Validation     -  
 Cross platform c++ programming     -  
 Can anyone suggest some script?     -  
 need help with java script in a pdf     -  
 Questions: Perl Script. PHP.     -  
 script for including files     -  
 Script ingoring lines     -  
 A PHP Number Guessing Script     -  
 Send Email from a PHP Script Example     -  
 XML validation against DTD     -  



Topic Tags

R Classifiers







Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
All copyrights reserved to codemiles.com 2007-2011
mileX v1.0 designed by codemiles team