Create New Dummy Variable Columns from Categorical Variable

Create new dummy variable columns from categorical variable

R has a "sub-language" to translate formulas into design matrix, and in the spirit of the language you can take advantage of it. It's fast and concise. Example: you have a cardinal predictor x, a categorical predictor catVar, and a response y.

> binom <- data.frame(y=runif(1e5), x=runif(1e5), catVar=as.factor(sample(0:4,1e5,TRUE)))
> head(binom)
y x catVar
1 0.5051653 0.34888390 2
2 0.4868774 0.85005067 2
3 0.3324482 0.58467798 2
4 0.2966733 0.05510749 3
5 0.5695851 0.96237936 1
6 0.8358417 0.06367418 2

You just do

> A <- model.matrix(y ~ x + catVar,binom) 
> head(A)
(Intercept) x catVar1 catVar2 catVar3 catVar4
1 1 0.34888390 0 1 0 0
2 1 0.85005067 0 1 0 0
3 1 0.58467798 0 1 0 0
4 1 0.05510749 0 0 1 0
5 1 0.96237936 1 0 0 0
6 1 0.06367418 0 1 0 0

Done.

Converting categorical column into a single dummy variable column

Here are multiple ways you can do:

from sklearn.preprocessing import LabelEncoder

lbl=LabelEncoder()
df['Sex_encoded'] = lbl.fit_transform(df['Sex'])

# using only pandas
df['Sex_encoded'] = df['Sex'].map({'male': 0, 'female': 1})

Survived Pclass Sex Age Fare Sex_encoded
0 0 3 male 22.0 7.2500 0
1 1 1 female 38.0 71.2833 1
2 1 3 female 26.0 7.9250 1
3 1 1 female 35.0 53.1000 1
4 0 3 male 35.0 8.0500 0

Create dummy variables from all categorical variables in a dataframe

Also one-liner with fastDummies package.

fastDummies::dummy_cols(customers)

id gender mood outcome gender_male gender_female mood_happy mood_sad
1 10 male happy 1 1 0 1 0
2 20 female sad 1 0 1 0 1
3 30 female happy 0 0 1 1 0
4 40 male sad 0 1 0 0 1
5 50 female happy 0 0 1 1 0

Make dummy variable for categorical data, based on ID column with duplicate values in python

Use crosstab with limit counts to 1 by DataFrame.clip:

df1  = (pd.crosstab(df['ID'], df['value'])
.clip(upper=1)
.reset_index()
.rename_axis(None, axis=1))
print (df1)
ID A B C
0 1 1 1 1
1 2 0 1 0
2 4 1 0 1
3 10 0 0 1

Create new columns from categorical variables

Use Series.str.get_dummies

https://pandas.pydata.org/docs/reference/api/pandas.Series.str.get_dummies.html#pandas.Series.str.get_dummies

dummy_cols =  df['column_factors'].str.get_dummies(sep=',')
df = df.join(dummy_cols).drop(columns='column_factors')


Related Topics



Leave a reply



Submit