feature_names mismach in xgboost despite having same columns
Thats an honest mistake.
When feeding your data you are using np arrays:
train_X, val_X, train_y, val_y = train_test_split(X.values, y.values, test_size=0.2)
(X.values is a np.array)
which do not have column names defined
when entering the data set for prediction you are using a dataframe
you should use a numpy array, you can convert it by using:
predictions = my_model.predict(test_data_process.values)
(add .values)
Why am I getting a "ValueError: feature_names mismatch" when specifying the feature-name list in XGBoost for visualization?
As we see, the issue is that d_test's columns are being renamed to f7, f31,...
), while d_train's columns are not.
It seems, the cause is here:
shap_values = shap.TreeExplainer(model).shap_values(X_train)
You pass X_train, while it's just a numpy array without column names (they become f31, f7
, and so on). Instead, try to pass a DataFrame with desired columns:
shap_values = shap.TreeExplainer(model).shap_values(pd.DataFrame(X_train, columns=X.columns))
Related Topics
Python Overflowerror: Int Too Large to Convert to Float
Regex That Matches a Number With Commas for Every Three Digits
How to Block Comment Code in the Ipython Notebook
Tkinter Ttk Treeview How to Set Fixed Width Why It Change With Number of Column
How to Test Multiple Variables for Equality Against a Single Value
Python: Getting Around Division by Zero
Python Regex - Finding Phone Number
How to Decompile a Compiled .Pyc File into a .Py File
Calculate Angle (Clockwise) Between Two Points
Subtract a Value from Every Number in a List in Python
Remove Very First Row in Pandas
What Causes a Python Segmentation Fault
Plot Two Histograms on Single Chart With Matplotlib
How to Change the Foreground or Background Colour of a Tkinter Button on MAC Os X
Paramiko Capturing Command Output
Python Serial: How to Use the Read or Readline Function to Read More Than 1 Character At a Time
Faster Way to Read Excel Files to Pandas Dataframe
Why Calling .Sort() Function on Pandas Series Sorts Its Values In-Place and Returns Nothing