Extracting Rpart rules to segment a dataset
Almost a year since question was posted, but could be of help to others. The observations' node assignments in the rpart tree are saved in tree$where
:
library("rpart")
airq <- airquality[complete.cases(airquality),]
tree <- rpart(Ozone ~ ., data = airq)
tree$where
Extracting list of the split of a lmtree object
Models of class lmtree
inherit from party
just like the output from ctree()
. Hence, the same approaches that are discussed here on SO for ctree()
output can also be applied to lmtree()
output. Namely, you can use the (still unexported) .list.rules.party()
function:
partykit:::.list.rules.party(tr)
## 2 3
## "z <= 0.495593577856198" "z > 0.495593577856198"
For further adaptations see: also:
- ctree() - How to get the list of splitting conditions for each terminal node?
- Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R
Properties and their values out of J48 tree (RWeka)
One way to do this is to convert the J48
object from RWeka
to a party
object from partykit
. You just need to as as.party(res)
and this does all the parsing for you and returns a structure that is easier to work with with standardized extractor functions etc.
In particular you can then use all advice given in other discussions about ctree
objects etc. See
How to extract the splitting rules for the terminal nodes of ctree()
Get decision tree rule/path pattern for every row of predicted dataset for rpart/ctree package in R
Identify all distinct variables within party ctree nodel
And I think the following should do at least part of what you want:
library("partykit")
pres <- as.party(res)
partykit:::.list.rules.party(pres)
## 2
## "Petal.Width <= 0.6"
## 5
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length <= 4.9"
## 7
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width <= 1.5"
## 8
## "Petal.Width > 0.6 & Petal.Width <= 1.7 & Petal.Length > 4.9 & Petal.Width > 1.5"
## 9
## "Petal.Width > 0.6 & Petal.Width > 1.7"
Update: The OP contacted me off-list for a related question, asking for a specific printed representation of the tree. I'm including my solution here in case it is useful for someone else.
He wanted to have ( ) symbols signalling the hierarchy levels plus the names of the splitting variables. One way to do so would be to (1) extract variable names of the underlying data:
nam <- names(pres$data)
(2) Turn the recursive node structure of the tree into a flat list (which is somewhat more convenient for constructing the desired string):
tr <- as.list(pres$node)
(3a) Initialize the string:
str <- "("
(3b) Recursively add brackets and/or variable names to the string:
update_str <- function(x) {
if(is.null(x$kids)) {
str <<- paste(str, ")")
} else {
str <<- paste(str, nam[x$split$varid], "(")
for(i in x$kids) update_str(tr[[i]])
}
}
(3c) Call the recursion, starting from the root node:
update_str(tr[[1]])
str
## [1] "( Petal.Width ( ) Petal.Width ( Petal.Length ( ) Petal.Width ( ) ) )"
Using rpart Package in R, error selecting all variables for decision tree model
It seemed to be the \ in one of the variable names.
Related Topics
How to Conditionally Highlight Points in Ggplot2 Facet Plots - Mapping Color to Column
Current Time in Iso 8601 Format
Compute the Minimum of a Pair of Vectors
How to Convert a Date from a Character String
Special Characters and Superscripts on Plot Axis Titles
How to Best Simulate an Arbitrary Univariate Random Variate Using Its Probability Function
Ggplot: Boxplot of Multiple Column Values
How to Strip Dollar Signs ($) from Data/ Escape Special Characters in R
How to Append a Whole Dataframe to a CSV in R
R * Not Meaningful for Factors Error
Split a String Column into Several Dummy Variables
Ggplot Custom Scale Transformation with Custom Ticks