parallel execution of random forest in R
Setting .multicombine
to TRUE
can make a significant difference:
rf <- foreach(ntree=rep(25000, 6), .combine=randomForest::combine,
.multicombine=TRUE, .packages='randomForest') %dopar% {
randomForest(x, y, ntree=ntree)
}
This causes combine
to be called once rather than five times. On my desktop machine, this runs in 8 seconds rather than 19 seconds.
Parallelizing random forests
There are a few answers on SO, such as parallel execution of random forest in R and Suggestions for speeding up Random Forests, that I would take a look at.
Those posts are helpful, but are a bit older. the ranger
package is an especially fast implementation of random forest, so if you are new to this it might be the easiest way to speed up your model training. Their paper discusses the tradeoffs of some of the available packages - depending on your data size and number of features, which package gives you the best performance will vary.
How to run randomForest in R on multiple cores in parallel?
I use the doMC
package and its registerDoMC
function. Works really well.
Small speed gain with parallel execution of random forest in Macbook (using R, caret)
The package 'ranger' you are using does have an internal multithreading support. That's why you are observing CPU usage aroung 300..330% in the first case - which means it already uses at least 3 cores for training.
When using doParallel, use are using multiprocessing instead of multithreading, but the total number of computing resources used in training is nearly the same, so you are not seeing much gain.
Related Topics
Get the Column Number in R Given the Column Name
Formatting Mouse Over Labels in Plotly When Using Ggplotly
Ggplot Geom_Point() with Colors Based on Specific, Discrete Values
Ggplot2 - Multi-Group Histogram with In-Group Proportions Rather Than Frequency
Subtract a Constant Vector from Each Row in a Matrix in R
Highlight (Shade) Plot Background in Specific Time Range
Labeling Center of Map Polygons in R Ggplot
What Does the Double Percentage Sign (%%) Mean
How to Show Matrix Values on Levelplot
Simple Method of Counting Non-Nas in Column of Data String
Does the Ternary Operator Exist in R
Replacing All Missing Values in R Data.Table with a Value
Clipping Raster Using Shapefile in R, But Keeping the Geometry of the Shapefile
Add Moving Average Plot to Time Series Plot in R
How to Draw Gridlines Using Abline() That Are Behind the Data
Adding Elements to a List in for Loop in R