Valueerror: Feature_Names Mismatch: in Xgboost in the Predict() Function

feature_names mismach in xgboost despite having same columns

Thats an honest mistake.

When feeding your data you are using np arrays:

train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)

(X.values is a np.array)

which do not have column names defined

when entering the data set for prediction you are using a dataframe

you should use a numpy array, you can convert it by using:

predictions = my_model.predict(test_data_process.values)  

(add .values)

Why am I getting a "ValueError: feature_names mismatch" when specifying the feature-name list in XGBoost for visualization?

As we see, the issue is that d_test's columns are being renamed to f7, f31,...), while d_train's columns are not.
It seems, the cause is here:

shap_values = shap.TreeExplainer(model).shap_values(X_train)

You pass X_train, while it's just a numpy array without column names (they become f31, f7, and so on). Instead, try to pass a DataFrame with desired columns:

shap_values = shap.TreeExplainer(model).shap_values(pd.DataFrame(X_train, columns=X.columns))



Related Topics



Leave a reply



Submit