Pandas: Get Dummies

You can try :

df = pd.get_dummies(df, columns=['type'])

Decide which category to drop in pandas get_dummies()

One trick is replace values to NaNs - here is removed one value per rows:

#columns with values for avoid
d = {'c1':'b', 'c2':'z'}

d1 = {k:{v: np.nan} for k, v in d.items()}
df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
   a  c  x  y
0  1  0  1  0
1  0  0  0  1
2  0  1  0  0

If need multiple values for remove per column use lists like:

d = {'c1':['b','c'], 'c2':['z']}

d1 = {k:{x: np.nan for x in v} for k, v in d.items()}
print (d1)
{'c1': {'b': nan, 'c': nan}, 'c2': {'z': nan}}

df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
   a  x  y
0  1  1  0
1  0  0  1
2  0  0  0

EDIT:

If values are unique per columns simplier is them removed in last step:

df = (pd.get_dummies(df, columns = ['c1', 'c2'], prefix='', prefix_sep='')
        .drop(['b','z'], axis=1))
print (df)
   a  c  x  y
0  1  0  1  0
1  0  0  0  1
2  0  1  0  0

pandas get_dummies() for multiple columns with a pre-defined list

Based on the post here, here is one answer:

df2 = pd.get_dummies(df[['Q1', 'Q2']].astype(pd.CategoricalDtype(categories=ls)))
df2.insert(0, 'id', df['id'])

Output:

df2
    id  Q1_a    Q1_b    Q1_c    Q2_a    Q2_b    Q2_c
0   01     1       0       0       0       0    1
1   02     0       1       0       0       1    0
2   03     1       0       0       1       0    0

Getting dummies/encoding using multiple columns in pandas

Use get_dummies by all columns with aggregate max by duplicated columns names:

df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(level=0, axis=1).max()
print (df)
         Apple  Banana  Guava  Kiwi  Mango
person1      1       0      0     0      0
person2      1       1      1     0      0
person3      0       0      1     0      0
person4      0       1      0     0      0
person5      1       1      1     1      1
person6      0       0      0     1      1

Or reshape first by DataFrame.stack, then aggregate max by index, first level:

df = pd.get_dummies(df.stack()).groupby(level=0).max()
print (df)
         Apple  Banana  Guava  Kiwi  Mango
person1      1       0      0     0      0
person2      1       1      1     0      0
person3      0       0      1     0      0
person4      0       1      0     0      0
person5      1       1      1     1      1
person6      0       0      0     1      1

Pandas Group By And Get Dummies

Let us set_index then get_dummies, since we have multiple duplicate in each ID ,we need to sum with level = 0

s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]: 
  ID  Business  Communications  Firewall  Security  Switches
0  A         0               0         1         1         0
1  B         0               1         0         0         0
2  C         1               0         0         0         1

How to specify which column to remove in get_dummies in pandas

IIUC, try use get_dummies then drop 'Human' column:

df['Architecture'].str.get_dummies().drop('Human', axis=1)

Output:

   Bart  Peg
0     1    0
1     1    0
2     0    1
3     0    0
4     0    0
5     0    1

Pandas get dummies() for numeric categorical data

You can convert values to strings:

df1 = pd.get_dummies(df.astype(str))

How to get dummies without prefix?

Use get_dummies with prefix='' and prefix_sep='' parameters. Also if it is possible some of the columns are numeric convert them to strings:

df = df.join(pd.get_dummies(df.astype(str), prefix='', prefix_sep=''))
print(df)

     X    Y  123  456  789  AAA  BBB  CCC
0  123  AAA    1    0    0    1    0    0
1  456  BBB    0    1    0    0    1    0
2  123  AAA    1    0    0    1    0    0
3  789  CCC    0    0    1    0    0    1

Pandas - get_dummies with value from another column

Do it in two steps:

dummies = pd.get_dummies(df['Mfr Number'])
dummies.values[dummies != 0] = df['Quantity']

Wanting to get_dummies for the most frequest values in a column - Pandas

Don't have time to gen data and work it all out. But though I'd get you this idea in case it might help you out.

The idea is to leverage .isin() to get the values that you need to build the dummies. Then leverage the power of the index to match to the source rows.

Something like:

pd.get_dummies(df.loc[df['hashtags'].isin(counts.nlargest(10).index)], columns=['hashtags'])

You will have to see if the indices will give you what you need.

Pandas: Get Dummies

Pandas: Get Dummies

Decide which category to drop in pandas get_dummies()

pandas get_dummies() for multiple columns with a pre-defined list

Getting dummies/encoding using multiple columns in pandas

Pandas Group By And Get Dummies

How to specify which column to remove in get_dummies in pandas

Pandas get dummies() for numeric categorical data

How to get dummies without prefix?

Pandas - get_dummies with value from another column

Wanting to get_dummies for the most frequest values in a column - Pandas

Related Topics

Leave a reply