Combing a Categorical Variable to Create a New Categorical Variable in R

Creating a New Variable Based on a Categorical Variable Already in the Dataset

The up-votes to the question greatly puzzle me... so an answer is wanted for this question?

With loop-based method, as OP intended, is:

Y <- numeric(length(X))  ## initialize a numeric vector `Y`, of the same length of `X`
## loop through all elements of `X`, use `if-else` to allocate value for `Y`
for (i in seq_along(X)) {
if (X[i] == "A") Y[i] <- 1
else if (X[i] == "B") Y[i] <- 2
else if (X[i] == "C") Y[i] <- 3

The fully vectorized method, is

Y <- match(X, LETTERS[1:3])

Here, LETTERS are internal R constants for capital letters. There are few constants in R, and you can get them all by reading documentation ?Constants.

Combining different dummy variables into a single categorical variable based on conditions (mutually exclusive categories)?

We can use a key/value dataset and do a join

keydat <- data.frame(g_kom = 1, v_kom = c(0, 0, 1, 1),
a_kom = c(0, 1, 0, 1), kat_kom1 = c(1, 4, 2, 3))

left_join(mydata, keydat) %>%
mutate(kat_kom1 = replace(kat_kom1, g_kom == 0, 0))

Creating categorical variables from mutually exclusive dummy variables

Update (2019): Please use dplyr::coalesce(), it works pretty much the same.

My R package has a convenience function that allows to choose the first non-NA value for each element in a list of vectors:

#install_github('kimisc', 'muelleki')

df$factor1 <- with(df,, conditionB))

(I'm not sure if this works if conditionA and conditionB are factors. Convert them to numerics before using as.numeric(as.character(...)) if necessary.)

Otherwise, you could give interaction a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

df$conditionAB <- with(df, interaction(, 0),, 0)))
levels(df$conditionAB) <- c('A', 'B')

Related Topics

Leave a reply