How to Create Dummy Variables

Using model.matrix() to create dummy variables

We could also convert to character

dataframe1$x1 <-  as.character(dataframe1$x1)
> model.matrix(~x1 - 1, dataframe1)
x11 x12 x13 x14 x15
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1

Creating dummy variables as counts using tidyverse/dplyr

using reshape2 but you could pretty much use any package that lets you reformat from long to wide

    library(reshape2)
df = dcast(fruitData,ID~FRUIT,length)

> df
ID apple banana grape
1 1 2 1 0
2 2 1 0 1
3 3 1 0 0

How to Create Conditional Dummy Variables (Panel Data) in R?

df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
EMPLOYMENT_STATUS = c(
"Employed",
"Employed",
"unemployed",
"Employed",
"unemployed",
"Employed",
"Employed",
"Employed",
"unemployed",
"unemployed"
)
)

library(tidyverse)
df %>%
group_by(id) %>%
mutate(dummy = +(all(wave %in% c(18, 21)) &
all(EMPLOYMENT_STATUS == "Employed"))) %>%
ungroup()
#> # A tibble: 10 x 4
#> id wave EMPLOYMENT_STATUS dummy
#> <int> <int> <chr> <int>
#> 1 1 18 Employed 1
#> 2 1 21 Employed 1
#> 3 2 18 unemployed 0
#> 4 2 21 Employed 0
#> 5 3 18 unemployed 0
#> 6 3 21 Employed 0
#> 7 4 18 Employed 0
#> 8 4 10 Employed 0
#> 9 5 18 unemployed 0
#> 10 5 21 unemployed 0

Created on 2022-01-23 by the reprex package (v2.0.1)

Create dummy variables if value is in list

Create the dummies, then reduce by duplicated indices to get your columns for the top 2:

a = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
.reset_index().groupby('index')[top2].sum().add_suffix('_bin')

If you want it in alphabetical order (in this case, Chinese followed by North Indian), add an intermediate step to sort columns with a.sort_index(axis=1).

Do the same for the other values, but reducing columns as well by passing axis=1 to any:

b = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
.reset_index().groupby('index')[not_top2].sum() \
.any(axis=1).astype(int).rename('Other')

Concatenating on indices:

>>> print(pd.concat([sample_df, a, b], axis=1).to_string())
name cuisines_lst North Indian_bin Chinese_bin Other
0 Jalsa [North Indian, Chinese] 1 1 0
1 Spice Elephant [Chinese, North Indian, Thai] 1 1 1
2 San Churro Cafe [Cafe, Mexican, Italian] 0 0 1

It may be strategic if you are operating on lots of data to create an intermediate data frame containing the exploded dummies on which the group-by operation can be performed.



Related Topics



Leave a reply



Submit