How to Create Dummy Variables

Using model.matrix() to create dummy variables

We could also convert to character

dataframe1$x1 <-  as.character(dataframe1$x1)
> model.matrix(~x1 - 1, dataframe1)
  x11 x12 x13 x14 x15
1   1   0   0   0   0
2   0   1   0   0   0
3   0   0   1   0   0
4   0   0   0   1   0
5   0   0   0   0   1

Creating dummy variables as counts using tidyverse/dplyr

using reshape2 but you could pretty much use any package that lets you reformat from long to wide

    library(reshape2)
    df = dcast(fruitData,ID~FRUIT,length)
   
    > df
    ID apple banana grape
  1  1     2      1     0
  2  2     1      0     1
  3  3     1      0     0

How to Create Conditional Dummy Variables (Panel Data) in R?

df <- data.frame(
  stringsAsFactors = FALSE,
  id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
  wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
  EMPLOYMENT_STATUS = c(
    "Employed",
    "Employed",
    "unemployed",
    "Employed",
    "unemployed",
    "Employed",
    "Employed",
    "Employed",
    "unemployed",
    "unemployed"
  )
)

library(tidyverse)
df %>%
  group_by(id) %>%
  mutate(dummy = +(all(wave %in% c(18, 21)) &
                     all(EMPLOYMENT_STATUS == "Employed"))) %>%
  ungroup()
#> # A tibble: 10 x 4
#>       id  wave EMPLOYMENT_STATUS dummy
#>    <int> <int> <chr>             <int>
#>  1     1    18 Employed              1
#>  2     1    21 Employed              1
#>  3     2    18 unemployed            0
#>  4     2    21 Employed              0
#>  5     3    18 unemployed            0
#>  6     3    21 Employed              0
#>  7     4    18 Employed              0
#>  8     4    10 Employed              0
#>  9     5    18 unemployed            0
#> 10     5    21 unemployed            0

^{Created on 2022-01-23 by the reprex package (v2.0.1)}

Create dummy variables if value is in list

Create the dummies, then reduce by duplicated indices to get your columns for the top 2:

a = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
    .reset_index().groupby('index')[top2].sum().add_suffix('_bin')

If you want it in alphabetical order (in this case, Chinese followed by North Indian), add an intermediate step to sort columns with a.sort_index(axis=1).

Do the same for the other values, but reducing columns as well by passing axis=1 to any:

b = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
    .reset_index().groupby('index')[not_top2].sum() \
    .any(axis=1).astype(int).rename('Other')

Concatenating on indices:

>>> print(pd.concat([sample_df, a, b], axis=1).to_string())
              name                   cuisines_lst  North Indian_bin  Chinese_bin  Other
0            Jalsa        [North Indian, Chinese]                 1            1      0
1   Spice Elephant  [Chinese, North Indian, Thai]                 1            1      1
2  San Churro Cafe       [Cafe, Mexican, Italian]                 0            0      1

It may be strategic if you are operating on lots of data to create an intermediate data frame containing the exploded dummies on which the group-by operation can be performed.

How to Create Dummy Variables

Using model.matrix() to create dummy variables

Creating dummy variables as counts using tidyverse/dplyr

How to Create Conditional Dummy Variables (Panel Data) in R?

Create dummy variables if value is in list

Related Topics

Leave a reply