Reversing 'One-Hot' Encoding in Pandas

Reversing 'one-hot' encoding in Pandas

I would use apply to decode the columns:

In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})

In [3]: def get_animal(row):
...: for c in animals.columns:
...: if row[c]==1:
...: return c

In [4]: animals.apply(get_animal, axis=1)
Out[4]:
0 rabbit
1 monkey
2 fox
3 None
4 None
dtype: object

Pandas, reverse one hot encoding

IIUC, you can use DataFrame.idxmax along axis=1. If necessary you can replace dummy prefix, with str.replace:

X_test[filter_col].idxmax(axis=1).str.replace('mycol_', '')

Python PANDAS: How to Reverse One-Hot Encoding Back to Categorical

You need

df['ind_all'] = (df.iloc[:, 1:] == 1).idxmax(1)

id ind_1 ind_2 ind_3 ind_all
0 1 0 1 0 ind_2
1 1 1 0 0 ind_1
2 2 0 1 0 ind_2
3 2 0 0 1 ind_3
4 3 0 0 1 ind_3
5 3 1 0 0 ind_1

How to convert (Not-One) Hot Encodings to a Column with Multiple Values on the Same Row

You can do DataFrame.dot which is much faster than iterating over all the rows in the dataframe:

df.dot(df.columns + ', ').str.rstrip(', ')


0         three, four
1 one, three, four
2 three
3 one, three
4
dtype: object

Reverse a get_dummies encoding in pandas

set_index + stack, stack will dropna by default

df.set_index('ID',inplace=True)

df[df==1].stack().reset_index().drop(0, axis=1)
Out[363]:
ID level_1
0 1002 2
1 1002 4
2 1004 1
3 1004 2
4 1005 5
5 1006 6
6 1007 1
7 1007 3
8 1009 3
9 1009 7

How to go back from ONE-HOT-ENCODED labels to single column using sklearn?

Use inverse_transform of LabelEncoder and OneHotEncoder:

import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

s = pd.Series(['a', 'b', 'c'])
le = LabelEncoder()
ohe = OneHotEncoder(sparse=False)
s1 = le.fit_transform(s)
s2 = ohe.fit_transform(s.to_numpy().reshape(-1, 1))

What you have:

# s1 from LabelEncoder
array([0, 1, 2])

# s2 from OneHotEncoder
array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])

What you should do:

inv_s1 = le.inverse_transform(s1)
inv_s2 = ohe.inverse_transform(s2).ravel()

Output:

# inv_s1 == inv_s2 == s
array(['a', 'b', 'c'], dtype=object)

How to decode 'one hot encoded' column names from bytes to string in pandas dataframe

You can pass the prefix param to get_dummies method as follows, then it will add the prefix to all columns as you need.

df = pd.DataFrame({'PROD_ID': ['OM', 'RM', 'VL']})
nwdf = pd.get_dummies(df,prefix=['PROD_ID'])
print(nwdf.columns)

Output: Index(['PROD_ID_OM', 'PROD_ID_RM', 'PROD_ID_VL'], dtype='object')

Sample Image



Related Topics



Leave a reply



Submit