Xgboost, Offset Exposure

xgboost implementation in h2o offset

Good question -- this is not documented in the parameter description (we use a common definition of offset_column among all algos and there's no note about how its not working in XGBoost). It is not functional and you should get an error if you try to supply it.

R example:

library(h2o)
h2o.init()

fit <- h2o.xgboost(x = 1:3, y = "Species", offset_column = "Petal.Width",
training_frame = as.h2o(iris))

Gives error:

Error: water.exceptions.H2OModelBuilderIllegalArgumentException: Illegal argument(s) for XGBoost model: XGBoost_model_R_1520909592004_2.  Details: ERRR on field: _offset_column: Offset is not supported for XGBoost.

base_margin or init_score for catboost regressor

After looking on the documentation, I found a viable solution. The fit method of both the CatBoostRegressor and CatboostClassifier provides a baseline and a sample_weight parameter that can be directly use to set an offset (for prior exposure) or a sample weight (for severity modeling).
Btw, the optimal approach is to create Pools and providing there the specification of offset and weights:

freq_train_pool = Pool(data=freq_train_ds, label=claim_nmb_train.values,cat_features=xvars_cat,baseline=claim_model_offset_train.values)
freq_valid_pool = Pool(data=freq_valid_ds, label=claim_nmb_valid.values,cat_features=xvars_cat,baseline=claim_model_offset_valid.values)
freq_test_pool = Pool(data=freq_test_ds, label=claim_nmb_test.values,cat_features=xvars_cat,baseline=claim_model_offset_test.values)

Here the data parameters contain pd.DataFrame with the predictors only, the label one che actual number of claim, cat_features are character lists specifying the categorical terms and the baseline terms are the np.array of log exposure. It works.

Using Pools allows to provide evaluation sets in the fit method.



Related Topics



Leave a reply



Submit