Fully reproducible parallel models using caret
One easy way to run fully reproducible model in parallel mode using the caret
package is by using the seeds argument when calling the train control. Here the above question is resolved, check the trainControl help page for further infos.
library(doParallel); library(caret)
#create a list of seed, here change the seed for each resampling
set.seed(123)
#length is = (n_repeats*nresampling)+1
seeds <- vector(mode = "list", length = 11)
#(3 is the number of tuning parameter, mtry for rf, here equal to ncol(iris)-2)
for(i in 1:10) seeds[[i]]<- sample.int(n=1000, 3)
#for the last model
seeds[[11]]<-sample.int(1000, 1)
#control list
myControl <- trainControl(method='cv', seeds=seeds, index=createFolds(iris$Species))
#run model in parallel
cl <- makeCluster(detectCores())
registerDoParallel(cl)
model1 <- train(Species~., iris, method='rf', trControl=myControl)
model2 <- train(Species~., iris, method='rf', trControl=myControl)
stopCluster(cl)
#compare
all.equal(predict(model1, type='prob'), predict(model2, type='prob'))
[1] TRUE
R caret with reproducible outcome/results
You should read the Notes on Reproducibility section on the package web page.
The seed number doesn't matter. I generate one with sample.int(100000, 1)
. Depending on how you are doing the model, you at least should set the seed just prior to calling train
(but please read the link above).
Related Topics
Replace Values in a Vector Based on Another Vector
How to Make Gradient Color Filled Timeseries Plot in R
Assigning Dates to Fiscal Year
Subsetting a Data Frame Based on Contents of Another Data Frame
Creating a Local R Package Repository
How to Convert Data.Frame to Transactions for Arules
Can Dcast Be Used Without an Aggregate Function
Duplicating (And Modifying) Discrete Axis in Ggplot2
R Function with No Return Value
Generate Dynamic R Markdown Blocks
Plotting with Ggplot2: "Error: Discrete Value Supplied to Continuous Scale" on Categorical Y-Axis
Find Which Interval Row in a Data Frame That Each Element of a Vector Belongs In
How to Avoid: Read.Table Truncates Numeric Values Beginning with 0
Extract Elements Common in All Column Groups