Convert various dummy/logical variables into a single categorical variable/factor from their name in R
Try:
library(dplyr)
library(tidyr)
df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id)
Which gives:
# id type
#1 1 conditionA
#2 2 conditionB
#3 3 conditionC
#4 4 conditionD
#5 5 conditionA
Update
To handle the case you detailed in the comments, you could do the operation on the desired portion of the data frame and then left_join()
the other columns:
df %>%
select(starts_with("condition"), id) %>%
gather(type, value, -id) %>%
na.omit() %>%
select(-value) %>%
left_join(., df %>% select(-starts_with("condition"))) %>%
arrange(id)
Covert dummy variables to single categorical in R?
Loop over the selected columns by row (MARGIN = 1
), subset the column names where the value is 1 and paste
them together
df$z <- apply(df[c('a', 'b', 'c')], 1, function(x) toString(names(x)[x ==1]))
df$z
#[1] "b" "b, c" "b" "a, b, c" "a" "" "b" "" "a" ""
If we want to change the ""
to '0'
df$z[df$z == ''] <- '0'
For a solution with purrr and dplyr:
df %>% mutate(z = pmap_chr(select(., a, b, c), ~ {v1 <- c(...); toString(names(v1)[v1 == 1])}))
dummy variables to single categorical variable (factor) in R
A quick solution would be something like
Res <- cbind(df[1], VALUE = factor(max.col(df[-1]), ordered = TRUE))
Res
# Pre VALUE
# 1 1 6
# 2 1 5
# 3 1 5
# 4 1 5
str(Res)
# 'data.frame': 4 obs. of 2 variables:
# $ Pre : int 1 1 1 1
# $ VALUE: Ord.factor w/ 2 levels "5"<"6": 2 1 1 1
OR if you want the actual names of the columns (as Pointed by @BondedDust), you can use the same methodology to extract them
factor(names(df)[1 + max.col(df[-1])], ordered = TRUE)
# [1] VALUE_6 VALUE_5 VALUE_5 VALUE_5
# Levels: VALUE_5 < VALUE_6
OR you can use your own which
strategy in the following way (btw, which
is vectorized so no need in using apply
with a margin of 1 on it)
cbind(df[1], VALUE = factor(which(df[-1] == 1, arr.ind = TRUE)[, 2], ordered = TRUE))
OR you can do matrix
multiplication (contributed by @akrun)
cbind(df[1], VALUE = factor(as.matrix(df[-1]) %*% seq_along(df[-1]), ordered = TRUE))
Reconstruct a categorical variable from dummies in R
You can do this with data.table
id_cols = c("x1", "x2")
data.table::melt.data.table(data = dt, id.vars = id_cols,
na.rm = TRUE,
measure = patterns("dummy"))
Example:
t = data.table(dummy_a = c(1, 0, 0), dummy_b = c(0, 1, 0), dummy_c = c(0, 0, 1), id = c(1, 2, 3))
data.table::melt.data.table(data = t,
id.vars = "id",
measure = patterns("dummy_"),
na.rm = T)[value == 1, .(id, variable)]
Output
id variable
1: 1 dummy_a
2: 2 dummy_b
3: 3 dummy_c
It's even easier if you remplaze 0 by NA, so na.rm = TRUE in melt will drop every row with NA
Transform dummy variable into categorical variable
with tidyverse you could also do:
data %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
slice(which.max(as.integer(factor(name))*value))%>%
mutate(name = if_else(value == 0, 'other',name), value= NULL)
# A tibble: 8 x 2
# Groups: ID [8]
ID name
<int> <chr>
1 1 Diag1
2 2 Diag2
3 3 Multiple.Diag
4 4 Multiple.Diag
5 5 Diag1
6 6 Diag3
7 7 Multiple.Diag
8 8 other
Creating categorical variables from mutually exclusive dummy variables
Update (2019): Please use
dplyr::coalesce()
, it works pretty much the same.
My R package has a convenience function that allows to choose the first non-NA
value for each element in a list of vectors:
#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)
df$factor1 <- with(df, coalesce.na(conditionA, conditionB))
(I'm not sure if this works if conditionA
and conditionB
are factors. Convert them to numerics before using as.numeric(as.character(...))
if necessary.)
Otherwise, you could give interaction
a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:
df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0),
coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')
Convert data frame with dummy variables into categorical variables
One option with lapply
by ignoring the first column (id
), we check which columns have value 1 in it and replace them with the corresponding column names and others can be changed to NA
.
data[-1] <- lapply(names(data[-1]), function(x) ifelse(data[x] == 1, x, NA))
data
# id red blue yellow
#1 1 red blue <NA>
#2 2 <NA> blue <NA>
#3 3 red blue <NA>
#4 4 <NA> blue <NA>
#5 5 red <NA> <NA>
#6 6 <NA> blue <NA>
#7 7 <NA> blue <NA>
#8 8 <NA> blue yellow
#9 9 <NA> <NA> yellow
Another approach without using lapply
loop
data[-1] <- ifelse(data[-1] == 1, names(data[-1])[col(data[-1])], NA)
data
# id red blue yellow
#1 1 red blue <NA>
#2 2 <NA> blue <NA>
#3 3 red blue <NA>
#4 4 <NA> blue <NA>
#5 5 red <NA> <NA>
#6 6 <NA> blue <NA>
#7 7 <NA> blue <NA>
#8 8 <NA> blue yellow
#9 9 <NA> <NA> yellow
Related Topics
Cannot Coerce Type 'Closure' to Vector of Type 'Character'
Format a Date Column in a Data Frame
How to Plot Logit and Probit in Ggplot2
How to Set Unique Row and Column Names of a Matrix When Its Dimension Is Unknown
Ggplot2 Equivalent of Matplot():Plot a Matrix/Array by Columns
Create Lagged Variable in Unbalanced Panel Data in R
Linear Regression and Storing Results in Data Frame
Replace Na with Zero in Dplyr Without Using List()
Why Does Median Trip Up Data.Table (Integer Versus Double)
Warning in Install.Packages:Installation of Package 'Tidyverse' Had Non-Zero Exit Status
Shade Region Between Two Lines with Ggplot
Summing Across Rows of a Data.Table for Specific Columns
Sort Matrix According to First Column in R