Reversing 'One-Hot' Encoding in Pandas

Reversing 'one-hot' encoding in Pandas

I would use apply to decode the columns:

In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})

In [3]: def get_animal(row):
   ...:     for c in animals.columns:
   ...:         if row[c]==1:
   ...:             return c

In [4]: animals.apply(get_animal, axis=1)
Out[4]: 
0    rabbit
1    monkey
2       fox
3      None
4      None
dtype: object

Pandas, reverse one hot encoding

IIUC, you can use DataFrame.idxmax along axis=1. If necessary you can replace dummy prefix, with str.replace:

X_test[filter_col].idxmax(axis=1).str.replace('mycol_', '')

Python PANDAS: How to Reverse One-Hot Encoding Back to Categorical

You need

df['ind_all'] = (df.iloc[:, 1:] == 1).idxmax(1)

    id  ind_1   ind_2   ind_3   ind_all
0   1   0       1       0       ind_2
1   1   1       0       0       ind_1
2   2   0       1       0       ind_2
3   2   0       0       1       ind_3
4   3   0       0       1       ind_3
5   3   1       0       0       ind_1

How to convert (Not-One) Hot Encodings to a Column with Multiple Values on the Same Row

You can do DataFrame.dot which is much faster than iterating over all the rows in the dataframe:

df.dot(df.columns + ', ').str.rstrip(', ')

0         three, four
1    one, three, four
2               three
3          one, three
4                    
dtype: object

Reverse a get_dummies encoding in pandas

set_index + stack, stack will dropna by default

df.set_index('ID',inplace=True)

df[df==1].stack().reset_index().drop(0, axis=1)
Out[363]: 
     ID level_1
0  1002       2
1  1002       4
2  1004       1
3  1004       2
4  1005       5
5  1006       6
6  1007       1
7  1007       3
8  1009       3
9  1009       7

How to go back from ONE-HOT-ENCODED labels to single column using sklearn?

Use inverse_transform of LabelEncoder and OneHotEncoder:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

s = pd.Series(['a', 'b', 'c'])
le = LabelEncoder()
ohe = OneHotEncoder(sparse=False)
s1 = le.fit_transform(s)
s2 = ohe.fit_transform(s.to_numpy().reshape(-1, 1))

What you have:

# s1 from LabelEncoder
array([0, 1, 2])

# s2 from OneHotEncoder
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

What you should do:

inv_s1 = le.inverse_transform(s1)
inv_s2 = ohe.inverse_transform(s2).ravel()

Output:

# inv_s1 == inv_s2 == s
array(['a', 'b', 'c'], dtype=object)

How to decode 'one hot encoded' column names from bytes to string in pandas dataframe

You can pass the prefix param to get_dummies method as follows, then it will add the prefix to all columns as you need.

df = pd.DataFrame({'PROD_ID': ['OM', 'RM', 'VL']})
nwdf = pd.get_dummies(df,prefix=['PROD_ID'])
print(nwdf.columns)

Output: Index(['PROD_ID_OM', 'PROD_ID_RM', 'PROD_ID_VL'], dtype='object')

Sample Image

Reversing 'One-Hot' Encoding in Pandas