Reverse a Get_Dummies Encoding in Pandas

Reverse a get_dummies encoding in pandas

set_index + stack, stack will dropna by default

df.set_index('ID',inplace=True)

df[df==1].stack().reset_index().drop(0, axis=1)
Out[363]: 
     ID level_1
0  1002       2
1  1002       4
2  1004       1
3  1004       2
4  1005       5
5  1006       6
6  1007       1
7  1007       3
8  1009       3
9  1009       7

Reverse get_dummies()

You can convert for dummies columns to index first by DataFrame.set_index:

#https://stackoverflow.com/a/62085741/2901002
df = undummify(df.set_index(['score1','score2'])).reset_index()

Or use alternative solution with DataFrame.melt, fiter rows with boolean indexing, splitting by Series.str.split and last pivoting by DataFrame.pivot:

df1 = df.melt(['score1','score2'])
df1 = df1[df1['value'].eq(1)]
df1[['a','b']] = df1.pop('variable').str.split('_', expand=True)
df1 = df1.pivot(index=['score1','score2'], columns='a', values='b').reset_index()
print (df1)
a  score1  score2 category country
0    0.55    0.54   leader      CN
1    0.89    0.45               AU

How to reverse a dummy variables from a pandas dataframe

We can use wide_to_long, then select rows that are not equal to zero i.e

ndf = pd.wide_to_long(df, stubnames='T_', i='id',j='T')

      T_
id  T     
id1 30   0
id2 30   1
id1 40   1
id2 40   0

not_dummy = ndf[ndf['T_'].ne(0)].reset_index().drop('T_',1)

   id   T
0  id2  30
1  id1  40

Update based on your edit :

ndf = pd.wide_to_long(df.reset_index(), stubnames='T_',i='index',j='T')

not_dummy = ndf[ndf['T_'].ne(0)].reset_index(level='T').drop('T_',1)

        T
index    
1      30
0      40

Pandas, reverse one hot encoding

IIUC, you can use DataFrame.idxmax along axis=1. If necessary you can replace dummy prefix, with str.replace:

X_test[filter_col].idxmax(axis=1).str.replace('mycol_', '')

Pandas Get Dummy Reversal For Prediction

You could use reindex to have the result dataframe have same columns as the second one:

Dataframe4 = pd.get_dummies(Dataframe3, columns=['feature_x', 'feature_y']
               ).reindex(columns=Dataframe2.columns).fillna(0).astype('int')

Reversing 'one-hot' encoding in Pandas

I would use apply to decode the columns:

In [2]: animals = pd.DataFrame({"monkey":[0,1,0,0,0],"rabbit":[1,0,0,0,0],"fox":[0,0,1,0,0]})

In [3]: def get_animal(row):
   ...:     for c in animals.columns:
   ...:         if row[c]==1:
   ...:             return c

In [4]: animals.apply(get_animal, axis=1)
Out[4]: 
0    rabbit
1    monkey
2       fox
3      None
4      None
dtype: object

Python how to inverse back the actual values after using one-hot-encode/pd.get_dummies

You can make use of the inverse_transform method of sklearn.preprocessing.OneHotEncoder to do it. I have illustrated it with an example below:

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X = [['Male'], ['Female'], ['Female']]
enc.fit(X)
enc.categories_

[array(['Female', 'Male'], dtype=object)]

enc.transform([['Female'], ['Male']]).toarray()

array([[1., 0.],
       [0., 1.]])

enc.inverse_transform([[0, 1], [1,0], [0, 1]])

array([['Male'],
       ['Female'],
       ['Male']], dtype=object)

To get the category-to-key dictionary you could do this:

A = {}
for i in enc.categories_[0]:
    A[i] = enc.transform([[i]]).toarray()

But there could be a better way for doing this.

Reconstruct a categorical variable from dummies in pandas

In [46]: s = Series(list('aaabbbccddefgh')).astype('category')

In [47]: s
Out[47]: 
0     a
1     a
2     a
3     b
4     b
5     b
6     c
7     c
8     d
9     d
10    e
11    f
12    g
13    h
dtype: category
Categories (8, object): [a < b < c < d < e < f < g < h]

In [48]: df = pd.get_dummies(s)

In [49]: df
Out[49]: 
    a  b  c  d  e  f  g  h
0   1  0  0  0  0  0  0  0
1   1  0  0  0  0  0  0  0
2   1  0  0  0  0  0  0  0
3   0  1  0  0  0  0  0  0
4   0  1  0  0  0  0  0  0
5   0  1  0  0  0  0  0  0
6   0  0  1  0  0  0  0  0
7   0  0  1  0  0  0  0  0
8   0  0  0  1  0  0  0  0
9   0  0  0  1  0  0  0  0
10  0  0  0  0  1  0  0  0
11  0  0  0  0  0  1  0  0
12  0  0  0  0  0  0  1  0
13  0  0  0  0  0  0  0  1

In [50]: x = df.stack()

# I don't think you actually need to specify ALL of the categories here, as by definition
# they are in the dummy matrix to start (and hence the column index)
In [51]: Series(pd.Categorical(x[x!=0].index.get_level_values(1)))
Out[51]: 
0     a
1     a
2     a
3     b
4     b
5     b
6     c
7     c
8     d
9     d
10    e
11    f
12    g
13    h
Name: level_1, dtype: category
Categories (8, object): [a < b < c < d < e < f < g < h]

So I think we need a function to 'do' this as it seems to be a natural operations. Maybe get_categories(), see here

How to convert (Not-One) Hot Encodings to a Column with Multiple Values on the Same Row

You can do DataFrame.dot which is much faster than iterating over all the rows in the dataframe:

df.dot(df.columns + ', ').str.rstrip(', ')

0         three, four
1    one, three, four
2               three
3          one, three
4                    
dtype: object

Reverse a Get_Dummies Encoding in Pandas