How to Specify Split in a Decision Tree in R Programming

How to set the splitting rule in decision_tree spec?

Following Emil Hvidfeldt's suggestion, the set_engine() function accepts us to pass arguments directly to the engine function.

This is the tree with information gain splitting rule:

formas_tree_spec <- 
decision_tree(min_n = 2) %>%
set_mode("classification") %>%
set_engine("rpart", parms = list(split = "information")

How to do decision trees in R?

I have used rpart before, which is handy. I have used for predictive modeling by splitting training and test set. Here is the code. Hope this will give you some idea...

 library(rpart)
library(rattle)
library(rpart.plot)
### Build the training/validate/test...

data(iris)
nobs <- nrow(iris)
train <- sample(nrow(iris), 0.7*nobs)
test <- setdiff(seq_len(nrow(iris)), train)
colnames(iris)

### The following variable selections have been noted.
input <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
numeric <- c("Sepal.Length","Sepal.Width","Petal.Length","Petal.Width")
categoric <- NULL
target <-"Species"
risk <- NULL
ident <- NULL
ignore <- NULL
weights <- NULL

#set.seed(500)
# Build the Decision Tree model.
rpart <- rpart(Species~.,
data=iris[train, ],
method="class",
parms=list(split="information"),
control=rpart.control(minsplit=12,
usesurrogate=0,
maxsurrogate=0))

# Generate a textual view of the Decision Tree model.
print(rpart)
printcp(rpart)

# Decision Tree Plot...
prp(rpart)
dev.new()
fancyRpartPlot(rpart, main="Decision Tree Graph")

How to specify number of branch in decision tree in R

Maybe I'm missing your question, but tree size in rpart is controlled by the complexity parameter (cp). You can try different values to get a different sized tree.

ad.apprentissage= rpart(rate~vqs+ibt+tbt+bf+n, data=filteredDataFinal, cp=0.1)

How to filter independent variables in decision-tree in R with rpart or party package

  1. I don't think that one of the more popular tree packages in R has a built-in option for specifying fixed initial splits. Using the partykit package (successor to the party package), however, has infrastructure that can be leveraged to put together such trees with a little bit of effort, see: How to specify split in a decision tree in R programming?
  2. You should use factor variables for unordered categorical covariates (like gender), ordered factors for ordinal covariates, and numeric or integer for numeric covariates. Note that this may not only matter in the visual display but also in the recursive partitioning itself. When using an exhaustive search algorithm like rpart/CART it is not relevant, but for unbiased inference-based algorithms like ctree or mob this may be an important difference.
  3. Cost-complexity pruning does not allow to keep specific covariates. It is a measure for the overall tree, not for individual variables.


Related Topics



Leave a reply



Submit