Using model.matrix() to create dummy variables
We could also convert to character
dataframe1$x1 <- as.character(dataframe1$x1)
> model.matrix(~x1 - 1, dataframe1)
x11 x12 x13 x14 x15
1 1 0 0 0 0
2 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
5 0 0 0 0 1
Creating dummy variables as counts using tidyverse/dplyr
using reshape2
but you could pretty much use any package that lets you reformat from long to wide
library(reshape2)
df = dcast(fruitData,ID~FRUIT,length)
> df
ID apple banana grape
1 1 2 1 0
2 2 1 0 1
3 3 1 0 0
How to Create Conditional Dummy Variables (Panel Data) in R?
df <- data.frame(
stringsAsFactors = FALSE,
id = c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L),
wave = c(18L, 21L, 18L, 21L, 18L, 21L, 18L, 10L, 18L, 21L),
EMPLOYMENT_STATUS = c(
"Employed",
"Employed",
"unemployed",
"Employed",
"unemployed",
"Employed",
"Employed",
"Employed",
"unemployed",
"unemployed"
)
)
library(tidyverse)
df %>%
group_by(id) %>%
mutate(dummy = +(all(wave %in% c(18, 21)) &
all(EMPLOYMENT_STATUS == "Employed"))) %>%
ungroup()
#> # A tibble: 10 x 4
#> id wave EMPLOYMENT_STATUS dummy
#> <int> <int> <chr> <int>
#> 1 1 18 Employed 1
#> 2 1 21 Employed 1
#> 3 2 18 unemployed 0
#> 4 2 21 Employed 0
#> 5 3 18 unemployed 0
#> 6 3 21 Employed 0
#> 7 4 18 Employed 0
#> 8 4 10 Employed 0
#> 9 5 18 unemployed 0
#> 10 5 21 unemployed 0
Created on 2022-01-23 by the reprex package (v2.0.1)
Create dummy variables if value is in list
Create the dummies, then reduce by duplicated indices to get your columns for the top 2:
a = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
.reset_index().groupby('index')[top2].sum().add_suffix('_bin')
If you want it in alphabetical order (in this case, Chinese followed by North Indian), add an intermediate step to sort columns with a.sort_index(axis=1)
.
Do the same for the other values, but reducing columns as well by passing axis=1
to any
:
b = pd.get_dummies(sample_df['cuisines_lst'].explode()) \
.reset_index().groupby('index')[not_top2].sum() \
.any(axis=1).astype(int).rename('Other')
Concatenating on indices:
>>> print(pd.concat([sample_df, a, b], axis=1).to_string())
name cuisines_lst North Indian_bin Chinese_bin Other
0 Jalsa [North Indian, Chinese] 1 1 0
1 Spice Elephant [Chinese, North Indian, Thai] 1 1 1
2 San Churro Cafe [Cafe, Mexican, Italian] 0 0 1
It may be strategic if you are operating on lots of data to create an intermediate data frame containing the exploded dummies on which the group-by operation can be performed.
Related Topics
R: Calculate Means for Subset of a Group
How to Read a Text File into Gnu R with a Multiple-Byte Separator
Replacing Negative Values in a Model (System of Odes) with Zero
Create Columns from Column of List in Data.Table
From Long to Wide Data with Multiple Columns
How to Calculate Confidence Intervals for Nonlinear Least Squares in R
How to Manage a Table/Matrix to Obtain Information Using Conditions
Rstudio Calls Source() When Saving Script
Connect R and Vertica Using Rodbc
How to Create a Bar and Line Plot with R Dygraphs
Converting to Date in a Character Column That Contains Two Date Formats
Find the Source File Containing R Function Definition
Print a List of Dynamically-Sized Plots in Knitr
The Representation of an Empty Argument in a "Call"
Sum Specific Columns Among Rows
How to Create a Presence-Absence Matrix