Create New Dummy Variable Columns from Categorical Variable

Create new dummy variable columns from categorical variable

R has a "sub-language" to translate formulas into design matrix, and in the spirit of the language you can take advantage of it. It's fast and concise. Example: you have a cardinal predictor x, a categorical predictor catVar, and a response y.

> binom <- data.frame(y=runif(1e5), x=runif(1e5), catVar=as.factor(sample(0:4,1e5,TRUE)))
> head(binom)
          y          x catVar
1 0.5051653 0.34888390      2
2 0.4868774 0.85005067      2
3 0.3324482 0.58467798      2
4 0.2966733 0.05510749      3
5 0.5695851 0.96237936      1
6 0.8358417 0.06367418      2

You just do

> A <- model.matrix(y ~ x + catVar,binom) 
> head(A)
  (Intercept)          x catVar1 catVar2 catVar3 catVar4
1           1 0.34888390       0       1       0       0
2           1 0.85005067       0       1       0       0
3           1 0.58467798       0       1       0       0
4           1 0.05510749       0       0       1       0
5           1 0.96237936       1       0       0       0
6           1 0.06367418       0       1       0       0

Done.

Converting categorical column into a single dummy variable column

Here are multiple ways you can do:

from sklearn.preprocessing import LabelEncoder

lbl=LabelEncoder()
df['Sex_encoded'] = lbl.fit_transform(df['Sex'])

# using only pandas
df['Sex_encoded'] = df['Sex'].map({'male': 0, 'female': 1})

   Survived  Pclass     Sex   Age     Fare  Sex_encoded
0         0       3    male  22.0   7.2500            0
1         1       1  female  38.0  71.2833            1
2         1       3  female  26.0   7.9250            1
3         1       1  female  35.0  53.1000            1
4         0       3    male  35.0   8.0500            0

Create dummy variables from all categorical variables in a dataframe

Also one-liner with fastDummies package.

fastDummies::dummy_cols(customers)

  id gender  mood outcome gender_male gender_female mood_happy mood_sad
1 10   male happy       1           1             0          1        0
2 20 female   sad       1           0             1          0        1
3 30 female happy       0           0             1          1        0
4 40   male   sad       0           1             0          0        1
5 50 female happy       0           0             1          1        0

Make dummy variable for categorical data, based on ID column with duplicate values in python

Use crosstab with limit counts to 1 by DataFrame.clip:

df1  = (pd.crosstab(df['ID'], df['value'])
          .clip(upper=1)
          .reset_index()
          .rename_axis(None, axis=1))
print (df1)
   ID  A  B  C
0   1  1  1  1
1   2  0  1  0
2   4  1  0  1
3  10  0  0  1

Create new columns from categorical variables

Use Series.str.get_dummies

https://pandas.pydata.org/docs/reference/api/pandas.Series.str.get_dummies.html#pandas.Series.str.get_dummies

dummy_cols =  df['column_factors'].str.get_dummies(sep=',')
df = df.join(dummy_cols).drop(columns='column_factors')

Create New Dummy Variable Columns from Categorical Variable