Pandas: Get Dummies
You can try :
df = pd.get_dummies(df, columns=['type'])
Decide which category to drop in pandas get_dummies()
One trick is replace values to NaN
s - here is removed one value per rows:
#columns with values for avoid
d = {'c1':'b', 'c2':'z'}
d1 = {k:{v: np.nan} for k, v in d.items()}
df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
a c x y
0 1 0 1 0
1 0 0 0 1
2 0 1 0 0
If need multiple values for remove per column use lists like:
d = {'c1':['b','c'], 'c2':['z']}
d1 = {k:{x: np.nan for x in v} for k, v in d.items()}
print (d1)
{'c1': {'b': nan, 'c': nan}, 'c2': {'z': nan}}
df = pd.get_dummies(df.replace(d1), columns = ['c1', 'c2'], prefix='', prefix_sep='')
print (df)
a x y
0 1 1 0
1 0 0 1
2 0 0 0
EDIT:
If values are unique per columns simplier is them removed in last step:
df = (pd.get_dummies(df, columns = ['c1', 'c2'], prefix='', prefix_sep='')
.drop(['b','z'], axis=1))
print (df)
a c x y
0 1 0 1 0
1 0 0 0 1
2 0 1 0 0
pandas get_dummies() for multiple columns with a pre-defined list
Based on the post here, here is one answer:
df2 = pd.get_dummies(df[['Q1', 'Q2']].astype(pd.CategoricalDtype(categories=ls)))
df2.insert(0, 'id', df['id'])
Output:
df2
id Q1_a Q1_b Q1_c Q2_a Q2_b Q2_c
0 01 1 0 0 0 0 1
1 02 0 1 0 0 1 0
2 03 1 0 0 1 0 0
Getting dummies/encoding using multiple columns in pandas
Use get_dummies
by all columns with aggregate max
by duplicated columns names:
df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(level=0, axis=1).max()
print (df)
Apple Banana Guava Kiwi Mango
person1 1 0 0 0 0
person2 1 1 1 0 0
person3 0 0 1 0 0
person4 0 1 0 0 0
person5 1 1 1 1 1
person6 0 0 0 1 1
Or reshape first by DataFrame.stack
, then aggregate max
by index, first level:
df = pd.get_dummies(df.stack()).groupby(level=0).max()
print (df)
Apple Banana Guava Kiwi Mango
person1 1 0 0 0 0
person2 1 1 1 0 0
person3 0 0 1 0 0
person4 0 1 0 0 0
person5 1 1 1 1 1
person6 0 0 0 1 1
Pandas Group By And Get Dummies
Let us set_index
then get_dummies
, since we have multiple duplicate in each ID ,we need to sum
with level = 0
s = df.set_index('ID')['L2'].str.get_dummies().max(level=0).reset_index()
Out[175]:
ID Business Communications Firewall Security Switches
0 A 0 0 1 1 0
1 B 0 1 0 0 0
2 C 1 0 0 0 1
How to specify which column to remove in get_dummies in pandas
IIUC, try use get_dummies then drop 'Human' column:
df['Architecture'].str.get_dummies().drop('Human', axis=1)
Output:
Bart Peg
0 1 0
1 1 0
2 0 1
3 0 0
4 0 0
5 0 1
Pandas get dummies() for numeric categorical data
You can convert values to strings:
df1 = pd.get_dummies(df.astype(str))
How to get dummies without prefix?
Use get_dummies
with prefix=''
and prefix_sep=''
parameters. Also if it is possible some of the columns are numeric convert them to strings:
df = df.join(pd.get_dummies(df.astype(str), prefix='', prefix_sep=''))
print(df)
X Y 123 456 789 AAA BBB CCC
0 123 AAA 1 0 0 1 0 0
1 456 BBB 0 1 0 0 1 0
2 123 AAA 1 0 0 1 0 0
3 789 CCC 0 0 1 0 0 1
Pandas - get_dummies with value from another column
Do it in two steps:
dummies = pd.get_dummies(df['Mfr Number'])
dummies.values[dummies != 0] = df['Quantity']
Wanting to get_dummies for the most frequest values in a column - Pandas
Don't have time to gen data and work it all out. But though I'd get you this idea in case it might help you out.
The idea is to leverage .isin()
to get the values that you need to build the dummies. Then leverage the power of the index to match to the source rows.
Something like:
pd.get_dummies(df.loc[df['hashtags'].isin(counts.nlargest(10).index)], columns=['hashtags'])
You will have to see if the indices will give you what you need.
Related Topics
Python's Insert Returning None
Pygame.Error: Video System Not Initialized
Paramiko Ssh Die/Hang with Big Output
How to Create Animated Sprites Using Sprite Sheets in Pygame
Simple Argparse Example Wanted: 1 Argument, 3 Results
In Pytest, What Is the Use of Conftest.Py Files
How to Find Out the Number of Cpus Using Python
Installing Pip Packages to $Home Folder
Compute a Confidence Interval from Sample Data
Backporting Python 3 Open(Encoding="Utf-8") to Python 2
How to Get the Duration of a Video in Python
Print a String as Hexadecimal Bytes
Matplotlib Xticks Not Lining Up with Histogram