Different results with randomForest() and caret's randomForest (method = rf)
Using formula interface in train converts factors to dummy. To compare results from caret
with randomForest
you should use the non-formula interface.
In your case, you should provide a seed inside trainControl
to get the same result as in randomForest
.
Section training in caret webpage, there are some notes on reproducibility where it explains how to use seeds.
library("randomForest")
set.seed(1)
rf.model <- randomForest(uptake ~ .,
data = CO2,
ntree = 50,
nodesize = 5,
mtry = 2,
importance = TRUE,
metric = "RMSE")
library("caret")
caret.oob.model <- train(CO2[, -5], CO2$uptake,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "oob", seed = 1),
allowParallel = FALSE)
If you are doing resampling, you should provide seeds for each resampling iteration and an additional one for the final model. Examples in ?trainControl
show how to create them.
In the following example, the last seed is for the final model and I set it to 1.
seeds <- as.vector(c(1:26), mode = "list")
# For the final model
seeds[[26]] <- 1
caret.boot.model <- train(CO2[, -5], CO2$uptake,
method = "rf",
ntree = 50,
tuneGrid = data.frame(mtry = 2),
nodesize = 5,
importance = TRUE,
metric = "RMSE",
trControl = trainControl(method = "boot", seeds = seeds),
allowParallel = FALSE)
Definig correctly the non-formula interface with caret
and seed in trainControl
you will get the same results in all three models:
rf.model
caret.oob.model$final
caret.boot.model$final
Problem with type = 'prob' argument in caret::train package
This works smoother with terra::predict
but with raster::predict
you can use the index
argument to specificy which output variable(s) you want.
predict_p_rf <- predict(image.x, model_rf, type = 'prob', index=1:3)
See ?raster::predict
The data represent the predicted probability of belonging to a particular category (0 is lowest probability, 1 is highest).
Error when using predict() on a randomForest object trained with caret's train() using formula
First, almost never use the $finalModel
object for prediction. Use predict.train
. This is one good example of why.
There is some inconsistency between how some functions (including randomForest
and train
) handle dummy variables. Most functions in R that use the formula method will convert factor predictors to dummy variables because their models require numerical representations of the data. The exceptions to this are tree- and rule-based models (that can split on categorical predictors), naive Bayes, and a few others.
So randomForest
will not create dummy variables when you use randomForest(y ~ ., data = dat)
but train
(and most others) will using a call like train(y ~ ., data = dat)
.
The error occurs because fuelType
is a factor. The dummy variables created by train
don't have the same names so predict.randomForest
can't find them.
Using the non-formula method with train
will pass the factor predictors to randomForest
and everything will work.
TL;DR
Use the non-formula method with train
if you want the same levels or use predict.train
Related Topics
Plotting Average of Multiple Variables in Time-Series Using Ggplot
Get Name of X When Defining '(<-' Operator
When Writing My Own R Package, I Can't Seem to Get Other Packages to Import Correctly
Creating a Monthly/Yearly Calendar Image with Ggplot2
How to Create Geom_Boxplot with Large Amount of Continuous X-Variables
How to Loop Through a Folder of CSV Files in R
Custom Fill Color in Ggvis (And Other Options)
Ellipse Containing Percentage of Given Points in R
Insert Images Using Knitr::Include_Graphics in a for Loop
Generating a Very Large Matrix of String Combinations Using Combn() and Bigmemory Package
Convert a Netcdf Time Variable to an R Date Object
Warning: Unable to Access Index for Repository Https://Www.Stats.Ox.Ac.Uk/Pub/Rwin/Src/Contrib:
Get the Last Row of a Previous Group in Data.Table
Find Value Closest to X by Group in Dplyr
R Stacked Bar Graph Plotting Geom_Text
How to Install the R Package Rgl on Ubuntu 9.10, Using R Version 2.12.1