Creating categorical variables from mutually exclusive dummy variables
Update (2019): Please use
dplyr::coalesce()
, it works pretty much the same.
My R package has a convenience function that allows to choose the first non-NA
value for each element in a list of vectors:
#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)
df$factor1 <- with(df, coalesce.na(conditionA, conditionB))
(I'm not sure if this works if conditionA
and conditionB
are factors. Convert them to numerics before using as.numeric(as.character(...))
if necessary.)
Otherwise, you could give interaction
a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:
df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0),
coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')
Combining different dummy variables into a single categorical variable based on conditions (mutually exclusive categories)?
We can use a key/value dataset and do a join
library(dplyr)
keydat <- data.frame(g_kom = 1, v_kom = c(0, 0, 1, 1),
a_kom = c(0, 1, 0, 1), kat_kom1 = c(1, 4, 2, 3))
left_join(mydata, keydat) %>%
mutate(kat_kom1 = replace(kat_kom1, g_kom == 0, 0))
Create mutually exclusive dummy variables from categorical variable in R
If your data is
id <- c(1,1,1,1)
time <- c(1,2,3,4)
df <- data.frame(id,time)
you can try
time <- as.character(time)
unique.time <- as.character(unique(df$time))
# Create a dichotomous dummy-variable for each time
x <- sapply(unique.time, function(x)as.numeric(df$time == x))
or
time.f = factor(time)
dummies = model.matrix(~time.f)
Converting multiple dummy variables that are not mutually exclusive into single categorical variable, adding new rows
library(tidyr)
library(dplyr)
tidyr::pivot_longer(
data,
cols = starts_with("strategy"),
names_prefix = "strategy",
names_to = "strategy"
) %>%
filter(value == 1) %>%
select(-value)
# # A tibble: 7 x 3
# id task strategy
# <dbl> <dbl> <chr>
# 1 1 1 1
# 2 1 2 1
# 3 1 2 3
# 4 2 1 2
# 5 2 2 1
# 6 2 2 2
# 7 2 2 3
Transform dummy variable into categorical variable
with tidyverse you could also do:
data %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
slice(which.max(as.integer(factor(name))*value))%>%
mutate(name = if_else(value == 0, 'other',name), value= NULL)
# A tibble: 8 x 2
# Groups: ID [8]
ID name
<int> <chr>
1 1 Diag1
2 2 Diag2
3 3 Multiple.Diag
4 4 Multiple.Diag
5 5 Diag1
6 6 Diag3
7 7 Multiple.Diag
8 8 other
Covert dummy variables to single categorical in R?
Loop over the selected columns by row (MARGIN = 1
), subset the column names where the value is 1 and paste
them together
df$z <- apply(df[c('a', 'b', 'c')], 1, function(x) toString(names(x)[x ==1]))
df$z
#[1] "b" "b, c" "b" "a, b, c" "a" "" "b" "" "a" ""
If we want to change the ""
to '0'
df$z[df$z == ''] <- '0'
For a solution with purrr and dplyr:
df %>% mutate(z = pmap_chr(select(., a, b, c), ~ {v1 <- c(...); toString(names(v1)[v1 == 1])}))
Create dummy variables from all categorical variables in a dataframe
Also one-liner with fastDummies
package.
fastDummies::dummy_cols(customers)
id gender mood outcome gender_male gender_female mood_happy mood_sad
1 10 male happy 1 1 0 1 0
2 20 female sad 1 0 1 0 1
3 30 female happy 0 0 1 1 0
4 40 male sad 0 1 0 0 1
5 50 female happy 0 0 1 1 0
Related Topics
Integrate() Gives Totally Wrong Number
Group/Bin/Bucket Data in R and Get Count Per Bucket and Sum of Values Per Bucket
Finding Which Element of a Vector Is Between Two Values in R
How to Change Color Scheme in Corrplot
Using Sample() with Sample Space Size = 1
Getting The Name of a Dataframe from Loading a .Rda File in R
Make a Boxplot Without Whiskers
R Bookdown - Custom Title Page
Convert 12Hour Time to 24Hour Time
Reconstruct Symmetric Matrix from Values in Long-Form
R - Stuck with Plot() - Colouring Shapefile Polygons Based Upon a Slot Value
Error: C Stack Usage Is Too Close to The Limit in R
All Paths in Directed Tree Graph from Root to Leaves in Igraph R