How is the feature score(/importance) in the XGBoost package calculated?
This is a metric that simply sums up how many times each feature is split on. It is analogous to the Frequency metric in the R version.https://cran.r-project.org/web/packages/xgboost/xgboost.pdf
It is about as basic a feature importance metric as you can get.
i.e. How many times was this variable split on?
The code for this method shows it is simply adding of the presence of a given feature in all the trees.
[here..https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py#L953][1]
def get_fscore(self, fmap=''):
"""Get feature importance of each feature.
Parameters
----------
fmap: str (optional)
The name of feature map file
"""
trees = self.get_dump(fmap) ## dump all the trees to text
fmap = {}
for tree in trees: ## loop through the trees
for line in tree.split('\n'): # text processing
arr = line.split('[')
if len(arr) == 1: # text processing
continue
fid = arr[1].split(']')[0] # text processing
fid = fid.split('<')[0] # split on the greater/less(find variable name)
if fid not in fmap: # if the feature id hasn't been seen yet
fmap[fid] = 1 # add it
else:
fmap[fid] += 1 # else increment it
return fmap # return the fmap, which has the counts of each time a variable was split on
How to get feature importance in xgboost by 'information gain'?
You can get it from
model.booster().get_score(importance_type='gain')
http://xgboost.readthedocs.io/en/latest/python/python_api.html
Related Topics
Regex to Extract Urls from Href Attribute in HTML with Python
What Does % Do to Strings in Python
Python Code to Remove HTML Tags from a String
How to Compile Opencv for iOS7 (Arm64)
Comparison of R, Statmodels, Sklearn for a Classification Task with Logistic Regression
How to Stop a Looping Thread in Python
How to Replace Only Part of the Match with Python Re.Sub
A Mutable Type Inside an Immutable Container
How to Save a Dictionary to a File
Is the List of Python Reserved Words and Builtins Available in a Library
What Is the Fastest Way to Upload a Big CSV File in Notebook to Work with Python Pandas
Pandas New Column from Groupby Averages
Display a 'Loading' Message While a Time Consuming Function Is Executed in Flask
I Can't Import Python Modules in Xcode 11 Using Pythonkit
How to Get Access of Individual Trees of a Xgboost Model in Python /R
How to Pass Optional Parameters to a Function