xgboost implementation in h2o offset
Good question -- this is not documented in the parameter description (we use a common definition of offset_column
among all algos and there's no note about how its not working in XGBoost). It is not functional and you should get an error if you try to supply it.
R example:
library(h2o)
h2o.init()
fit <- h2o.xgboost(x = 1:3, y = "Species", offset_column = "Petal.Width",
training_frame = as.h2o(iris))
Gives error:
Error: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for XGBoost model: XGBoost_model_R_1520909592004_2. Details: ERRR on field: _offset_column: Offset is not supported for XGBoost.
base_margin or init_score for catboost regressor
After looking on the documentation, I found a viable solution. The fit method of both the CatBoostRegressor
and CatboostClassifier
provides a baseline
and a sample_weight
parameter that can be directly use to set an offset (for prior exposure) or a sample weight (for severity modeling).
Btw, the optimal approach is to create Pool
s and providing there the specification of offset and weights:
freq_train_pool = Pool(data=freq_train_ds, label=claim_nmb_train.values,cat_features=xvars_cat,baseline=claim_model_offset_train.values)
freq_valid_pool = Pool(data=freq_valid_ds, label=claim_nmb_valid.values,cat_features=xvars_cat,baseline=claim_model_offset_valid.values)
freq_test_pool = Pool(data=freq_test_ds, label=claim_nmb_test.values,cat_features=xvars_cat,baseline=claim_model_offset_test.values)
Here the data
parameters contain pd.DataFrame
with the predictors only, the label
one che actual number of claim, cat_features
are character lists specifying the categorical terms and the baseline
terms are the np.array of log exposure. It works.
Using Pools allows to provide evaluation sets in the fit method.
Related Topics
Stargazer Left Align Latex Table Columns
Transfer Data from Database to Spark Using Sparklyr
Concatenate Values Across Columns in Data.Table, Row by Row
Plot Line on Top of Stacked Bar Chart in Ggplot2
Rmarkdown Removes Citation Hyperlink
Installing R Studio with Anaconda
Use Loop to Split a List into Multiple Dataframes
Ggplot2: How to Set the Default Fill-Colour of Geom_Bar() in a Theme
R Calculate the Average of One Column Corresponding to Each Bin of Another Column
Assigning and Removing Objects in a Loop: Eval(Parse(Paste(
How to Classify a Given Date/Time by the Season (E.G. Summer, Autumn)
Specify Function Parameters in Do.Call