How to access weighting of indiviual decision trees in xgboost?
Each tree is given the same weight eta
and the overall prediction is the sum of the predictions of each tree, as you say.
You'd perhaps expect that the earlier trees are given more weight than the latter trees but that's not necessary, due to the way the response is updated after every tree. Here's a toy example:
Suppose we have 5 observations, with responses 10, 20, 30, 40, 50. The first tree is built and gives predictions of 12, 18, 27, 39, 54.
Now, if eta
= 1, the response variables passed to the next tree will be -2, 2, 3, 1, -4 (i.e. the difference between the prediction and the true response). The next tree will then try to learn the 'noise' that wasn't captured by the first tree. If nrounds
= 2, then the sum of the predictions from the two trees will give the final prediction of the model.
If instead eta
= 0.1, all trees will have their predictions scaled down by eta
, so the first tree will instead 'predict' 1.2, 1.8, 2.7, 3.9, 5.4. The response variable passed to the next tree will then have values 8.8, 18.2, 27.3, 36.1, 44.6 (the difference between the scaled prediction and the true response) The second round then uses these response values to build another tree - and again the predictions are scaled by eta
. So tree 2 predicts say, 7, 18, 25, 40, 40, which, once scaled, become 0.7, 1.8, 2.5, 4.0, 4.0. As before, the third tree will be passed the difference between these values and the previous tree's response variable (so 8.1, 16.4, 24.8, 32.1. 40.6). Again, the sum of the predictions from all trees will give the final prediction.
Clearly when eta
= 0.1, and base_score
is 0, you'll need at least 10 rounds to get a prediction that's anywhere near sensible. In general, you need an absolute minimum of 1/eta
rounds and typically many more.
The rationale for using a small eta
is that the model benefits from taking small steps towards the prediction rather than making tree 1 do the majority of the work. It's a bit like crystallisation - cool slowly and you get bigger, better crystals. The downside is you need to increase nrounds
, thus increasing the runtime of the algorithm.
Manipulation and interpretation of xgboost models in python
(I see it's nine months too late, but here's a rudimentary answer as other people may be interested in this...)
split_indices
refers to the index (0-based) of the list of features supplied during training. It basically says "At this node (position in the array) use feature N for splitting".
For split nodes, split_conditions
refers to the threshold for splitting -- if feature < split_condition
go left, if >=
go right. Plus the treatment of NAs (default_left
tells you where they go at each split).
In your example the first split would be based on feature #3 at threshold 0.073486.
For leaf nodes, the split_condition
contains the leaf value, i.e. the prediction for observations falling into that leaf. (With possible caveats depending on type of problem, transformations, etc.)left_children
and right_children
have a value of -1 for the leaf nodes.
Hope this helps someone to get started -- there's quite a bit of other details. Some of the info in the json
is not needed for prediction but allows to calculate e.g. the feature importance metrics and how the tree was constructed.
Finally, for me plotting the tree (xgboost.to_graphviz(booster=m0)
) helps a lot in interpreting the info in the json
.
Making an example of a depth=1 (single splitting node, 2 leaf nodes) would be even easier to interpret.
Extract sample of features used to build each tree in H2O
If you want to see which features were used at a given split in a give tree you can navigate the H2OTree object.
For R see documentation here and here
For Python see documentation here
You can also take a look at this Blog (if this link ever dies just do a google search for H2OTree class)
Related Topics
Concatenating Two One-Dimensional Numpy Arrays
How to Convert a Currency String to a Floating Point Number in Python
Finding What Methods a Python Object Has
How to Change the Styles of Pandas Dataframe Headers
R, Python: Install Packages on Rpy2
How to Integrate a Standalone Python Script into a Rails Application
Python VS. Ruby for Metaprogramming
Dead Simple Example of Using Multiprocessing Queue, Pool and Locking
How to Do Multiple Substitutions Using Regex
Is Python's Sorted() Function Guaranteed to Be Stable
Pandas Groupby Range of Values
Logger Configuration to Log to File and Print to Stdout
Color Coding Cells in a Table Based on the Cell Value Using Jinja Templates
Trouble Installing Rpy2 on Win7 (R 2.12, Python 2.5)
Does Python Have an "Or Equals" Function Like ||= in Ruby