Pandas: Get Dummies

Pandas: Get Dummies

You can try :

df = pd.get_dummies(df, columns=['type'])

Decide which category to drop in pandas get_dummies()

One trick is replace values to NaNs - here is removed one value per rows:

#columns with values for avoid
d = {'c1':'b', 'c2':'z'}

d1 = {k:{v: np.nan} for k, v in d.items()}
df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
a c x y
0 1 0 1 0
1 0 0 0 1
2 0 1 0 0

If need multiple values for remove per column use lists like:

d = {'c1':['b','c'], 'c2':['z']}

d1 = {k:{x: np.nan for x in v} for k, v in d.items()}
print (d1)
{'c1': {'b': nan, 'c': nan}, 'c2': {'z': nan}}

df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
a x y
0 1 1 0
1 0 0 1
2 0 0 0

EDIT:

If values are unique per columns simplier is them removed in last step:

df = (pd.get_dummies(df, columns = ['c1', 'c2'], prefix='', prefix_sep='')
.drop(['b','z'], axis=1))
print (df)
a c x y
0 1 0 1 0
1 0 0 0 1
2 0 1 0 0

pandas get_dummies() for multiple columns with a pre-defined list

Based on the post here, here is one answer:

df2 = pd.get_dummies(df[['Q1', 'Q2']].astype(pd.CategoricalDtype(categories=ls)))
df2.insert(0, 'id', df['id'])

Output:

df2
id Q1_a Q1_b Q1_c Q2_a Q2_b Q2_c
0 01 1 0 0 0 0 1
1 02 0 1 0 0 1 0
2 03 1 0 0 1 0 0

Getting dummies/encoding using multiple columns in pandas

Use get_dummies by all columns with aggregate max by duplicated columns names:

df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(level=0, axis=1).max()
print (df)
Apple Banana Guava Kiwi Mango
person1 1 0 0 0 0
person2 1 1 1 0 0
person3 0 0 1 0 0
person4 0 1 0 0 0
person5 1 1 1 1 1
person6 0 0 0 1 1

Or reshape first by DataFrame.stack, then aggregate max by index, first level:

df = pd.get_dummies(df.stack()).groupby(level=0).max()
print (df)
Apple Banana Guava Kiwi Mango
person1 1 0 0 0 0
person2 1 1 1 0 0
person3 0 0 1 0 0
person4 0 1 0 0 0
person5 1 1 1 1 1
person6 0 0 0 1 1

Pandas Group By And Get Dummies

Let us set_index then get_dummies, since we have multiple duplicate in each ID ,we need to sum with level = 0

s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]:
ID Business Communications Firewall Security Switches
0 A 0 0 1 1 0
1 B 0 1 0 0 0
2 C 1 0 0 0 1

How to specify which column to remove in get_dummies in pandas

IIUC, try use get_dummies then drop 'Human' column:

df['Architecture'].str.get_dummies().drop('Human', axis=1)

Output:

   Bart  Peg
0 1 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 1

Pandas get dummies() for numeric categorical data

You can convert values to strings:

df1 = pd.get_dummies(df.astype(str))

How to get dummies without prefix?

Use get_dummies with prefix='' and prefix_sep='' parameters. Also if it is possible some of the columns are numeric convert them to strings:

df = df.join(pd.get_dummies(df.astype(str), prefix='', prefix_sep=''))
print(df)

X Y 123 456 789 AAA BBB CCC
0 123 AAA 1 0 0 1 0 0
1 456 BBB 0 1 0 0 1 0
2 123 AAA 1 0 0 1 0 0
3 789 CCC 0 0 1 0 0 1

Pandas - get_dummies with value from another column

Do it in two steps:

dummies = pd.get_dummies(df['Mfr Number'])
dummies.values[dummies != 0] = df['Quantity']

Wanting to get_dummies for the most frequest values in a column - Pandas

Don't have time to gen data and work it all out. But though I'd get you this idea in case it might help you out.

The idea is to leverage .isin() to get the values that you need to build the dummies. Then leverage the power of the index to match to the source rows.

Something like:

pd.get_dummies(df.loc[df['hashtags'].isin(counts.nlargest(10).index)], columns=['hashtags']) 

You will have to see if the indices will give you what you need.



Related Topics



Leave a reply



Submit