Reversing 'one-hot' encoding in Pandas
I would use apply to decode the columns:
In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})
In [3]: def get_animal(row):
...: for c in animals.columns:
...: if row[c]==1:
...: return c
In [4]: animals.apply(get_animal, axis=1)
Out[4]:
0 rabbit
1 monkey
2 fox
3 None
4 None
dtype: object
Pandas, reverse one hot encoding
IIUC, you can use DataFrame.idxmax
along axis=1
. If necessary you can replace dummy prefix, with str.replace
:
X_test[filter_col].idxmax(axis=1).str.replace('mycol_', '')
Python PANDAS: How to Reverse One-Hot Encoding Back to Categorical
You need
df['ind_all'] = (df.iloc[:, 1:] == 1).idxmax(1)
id ind_1 ind_2 ind_3 ind_all
0 1 0 1 0 ind_2
1 1 1 0 0 ind_1
2 2 0 1 0 ind_2
3 2 0 0 1 ind_3
4 3 0 0 1 ind_3
5 3 1 0 0 ind_1
How to convert (Not-One) Hot Encodings to a Column with Multiple Values on the Same Row
You can do DataFrame.dot
which is much faster
than iterating over all the rows in the dataframe:
df.dot(df.columns + ', ').str.rstrip(', ')
0 three, four
1 one, three, four
2 three
3 one, three
4
dtype: object
Reverse a get_dummies encoding in pandas
set_index
+ stack
, stack will dropna by default
df.set_index('ID',inplace=True)
df[df==1].stack().reset_index().drop(0, axis=1)
Out[363]:
ID level_1
0 1002 2
1 1002 4
2 1004 1
3 1004 2
4 1005 5
5 1006 6
6 1007 1
7 1007 3
8 1009 3
9 1009 7
How to go back from ONE-HOT-ENCODED labels to single column using sklearn?
Use inverse_transform
of LabelEncoder
and OneHotEncoder
:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
s = pd.Series(['a', 'b', 'c'])
le = LabelEncoder()
ohe = OneHotEncoder(sparse=False)
s1 = le.fit_transform(s)
s2 = ohe.fit_transform(s.to_numpy().reshape(-1, 1))
What you have:
# s1 from LabelEncoder
array([0, 1, 2])
# s2 from OneHotEncoder
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])
What you should do:
inv_s1 = le.inverse_transform(s1)
inv_s2 = ohe.inverse_transform(s2).ravel()
Output:
# inv_s1 == inv_s2 == s
array(['a', 'b', 'c'], dtype=object)
How to decode 'one hot encoded' column names from bytes to string in pandas dataframe
You can pass the prefix param to get_dummies
method as follows, then it will add the prefix to all columns as you need.
df = pd.DataFrame({'PROD_ID': ['OM', 'RM', 'VL']})
nwdf = pd.get_dummies(df,prefix=['PROD_ID'])
print(nwdf.columns)
Output: Index(['PROD_ID_OM', 'PROD_ID_RM', 'PROD_ID_VL'], dtype='object')
Related Topics
Blocking and Non Blocking Subprocess Calls
Differencebetween List and List[:] in Python
How to Import a Python Class That Is in a Directory Above
How to Print Variable and String on Same Line in Python
Does Python Optimize Modules When They Are Imported Multiple Times
Find All Combinations of a List of Numbers with a Given Sum
What Is the Perfect Counterpart in Python for "While Not Eof"
Best Way to Find the Months Between Two Dates
Getting List of Lists into Pandas Dataframe
Vscode -- How to Set Working Directory for Debugging a Python Program
How to Pickle a Python Function (Or Otherwise Serialize Its Code)
Python Mocking Raw Input in Unittests
Django - No Such Table: Main.Auth_User_Old
How to Validate a Date String Format in Python
Ipython Reads Wrong Python Version
Where Is a Complete Example of Logging.Config.Dictconfig