Reconstruct a categorical variable from dummies in R
You can do this with data.table
id_cols = c("x1", "x2")
data.table::melt.data.table(data = dt, id.vars = id_cols,
na.rm = TRUE,
measure = patterns("dummy"))
Example:
t = data.table(dummy_a = c(1, 0, 0), dummy_b = c(0, 1, 0), dummy_c = c(0, 0, 1), id = c(1, 2, 3))
data.table::melt.data.table(data = t,
id.vars = "id",
measure = patterns("dummy_"),
na.rm = T)[value == 1, .(id, variable)]
Output
id variable
1: 1 dummy_a
2: 2 dummy_b
3: 3 dummy_c
It's even easier if you remplaze 0 by NA, so na.rm = TRUE in melt will drop every row with NA
How to reconstruct a categorical variable with multiple choices
df_old <- read.table(text = "a1 a2 a3 a4 a5 a6 a7
0 0 1 1 0 1 0
1 1 1 0 0 0 0
0 1 0 0 1 0 1", header = T)
df_old %>% mutate(rowid = row_number()) %>%
pivot_longer(!rowid) %>%
filter(value != 0) %>%
group_by(rowid) %>%
mutate(choice = paste0('choice', seq_len(max(rowSums(df_old))))) %>%
pivot_wider(id_cols = rowid, names_from = choice, values_from = name) %>%
select(-rowid)
# A tibble: 3 x 4
# Groups: rowid [3]
rowid choice1 choice2 choice3
<int> <chr> <chr> <chr>
1 1 a3 a4 a6
2 2 a1 a2 a3
3 3 a2 a5 a7
Convert various dummy/logical variables into a single categorical variable/factor from their name in R
Try:
library(dplyr)
library(tidyr)
df %>% gather(type, value, -id) %>% na.omit() %>% select(-value) %>% arrange(id)
Which gives:
# id type
#1 1 conditionA
#2 2 conditionB
#3 3 conditionC
#4 4 conditionD
#5 5 conditionA
Update
To handle the case you detailed in the comments, you could do the operation on the desired portion of the data frame and then left_join()
the other columns:
df %>%
select(starts_with("condition"), id) %>%
gather(type, value, -id) %>%
na.omit() %>%
select(-value) %>%
left_join(., df %>% select(-starts_with("condition"))) %>%
arrange(id)
Using dplyr to gather dummy variables
This can be done using the 'tidyverse' library - specificially 'tidyr' and 'dplyr'. The following code produces the output you are after.
library(tidyverse)
type %>% gather(TypeOfCar, Count) %>% filter(Count >= 1) %>% select(TypeOfCar)
Output:
TypeOfCar
<chr>
1 convertible
2 convertible
3 convertible
4 convertible
5 coupe
6 sedan
Hopefully this solves your problem, let me know if any changes are needed! Thanks.
Reconstruct a categorical variable from dummies in pandas
In [46]: s = Series(list('aaabbbccddefgh')).astype('category')
In [47]: s
Out[47]:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 d
9 d
10 e
11 f
12 g
13 h
dtype: category
Categories (8, object): [a < b < c < d < e < f < g < h]
In [48]: df = pd.get_dummies(s)
In [49]: df
Out[49]:
a b c d e f g h
0 1 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0
4 0 1 0 0 0 0 0 0
5 0 1 0 0 0 0 0 0
6 0 0 1 0 0 0 0 0
7 0 0 1 0 0 0 0 0
8 0 0 0 1 0 0 0 0
9 0 0 0 1 0 0 0 0
10 0 0 0 0 1 0 0 0
11 0 0 0 0 0 1 0 0
12 0 0 0 0 0 0 1 0
13 0 0 0 0 0 0 0 1
In [50]: x = df.stack()
# I don't think you actually need to specify ALL of the categories here, as by definition
# they are in the dummy matrix to start (and hence the column index)
In [51]: Series(pd.Categorical(x[x!=0].index.get_level_values(1)))
Out[51]:
0 a
1 a
2 a
3 b
4 b
5 b
6 c
7 c
8 d
9 d
10 e
11 f
12 g
13 h
Name: level_1, dtype: category
Categories (8, object): [a < b < c < d < e < f < g < h]
So I think we need a function to 'do' this as it seems to be a natural operations. Maybe get_categories()
, see here
Creating categorical variables from mutually exclusive dummy variables
Update (2019): Please use
dplyr::coalesce()
, it works pretty much the same.
My R package has a convenience function that allows to choose the first non-NA
value for each element in a list of vectors:
#library(devtools)
#install_github('kimisc', 'muelleki')
library(kimisc)
df$factor1 <- with(df, coalesce.na(conditionA, conditionB))
(I'm not sure if this works if conditionA
and conditionB
are factors. Convert them to numerics before using as.numeric(as.character(...))
if necessary.)
Otherwise, you could give interaction
a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:
df$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0),
coalesce.na(conditionB, 0)))
levels(df$conditionAB) <- c('A', 'B')
Convert categorical variable into binary columns in R
Try this:
library(dplyr)
library(tidyr)
df %>%
separate_rows(answer_openq, sep = ',') %>%
pivot_wider(names_from = answer_openq, values_from = answer_openq,
values_fn = function(x) 1, values_fill = 0)
# A tibble: 4 × 5
respondent a c b d
<int> <dbl> <dbl> <dbl> <dbl>
1 1 1 0 0 0
2 2 1 1 0 0
3 3 0 0 1 0
4 4 1 0 0 1
Related Topics
Usage of Dot/Period in R Functions
How to Find Correct Executable with Sys.Which on Windows
Separate a Column into Multiple Columns Using Tidyr::Separate with Sep=""
How to Use R to Create a Word Co-Occurrence Matrix
Drop Columns That Take Less Than N Values
Cumulative Minimum Value by Group
Merge Multiple Data.Frames in R with Varying Row Length
Accessing Functions with a Dot in Their Name (Eg. "As.Vector") Using Rpy2
R: Miscellaneous Errors While Trying to Plot Graphs
Shiny Ui.R - Error in Tag("Div", List(...)) - Not Sure Where Error Is
Calculate Centroid Within/Inside a Spatialpolygon
Read Column Names as Date Format
Convert Byte Encoding to Unicode
Install R Packages in Azure Ml