﻿ Combing a Categorical Variable to Create a New Categorical Variable in R - ITCodar

# Combing a Categorical Variable to Create a New Categorical Variable in R

## Creating a New Variable Based on a Categorical Variable Already in the Dataset

The up-votes to the question greatly puzzle me... so an answer is wanted for this question?

With loop-based method, as OP intended, is:

``Y <- numeric(length(X))  ## initialize a numeric vector `Y`, of the same length of `X`## loop through all elements of `X`, use `if-else` to allocate value for `Y`for (i in seq_along(X)) {  if (X[i] == "A") Y[i] <- 1  else if (X[i] == "B") Y[i] <- 2  else if (X[i] == "C") Y[i] <- 3  }``

The fully vectorized method, is

``Y <- match(X, LETTERS[1:3])``

Here, `LETTERS` are internal R constants for capital letters. There are few constants in R, and you can get them all by reading documentation `?Constants`.

## Combining different dummy variables into a single categorical variable based on conditions (mutually exclusive categories)?

We can use a key/value dataset and do a join

``library(dplyr)keydat <- data.frame(g_kom = 1, v_kom = c(0, 0, 1, 1),                        a_kom = c(0, 1, 0, 1), kat_kom1 = c(1, 4, 2, 3))left_join(mydata, keydat) %>%     mutate(kat_kom1 = replace(kat_kom1, g_kom == 0, 0))``

## Creating categorical variables from mutually exclusive dummy variables

Update (2019): Please use `dplyr::coalesce()`, it works pretty much the same.

My R package has a convenience function that allows to choose the first non-`NA` value for each element in a list of vectors:

``#library(devtools)#install_github('kimisc', 'muelleki')library(kimisc)df\$factor1 <- with(df, coalesce.na(conditionA, conditionB))``

(I'm not sure if this works if `conditionA` and `conditionB` are factors. Convert them to numerics before using `as.numeric(as.character(...))` if necessary.)

Otherwise, you could give `interaction` a try, combined with recoding of the levels of the resulting factor -- but to me it looks like you're more interested in the first solution:

``df\$conditionAB <- with(df, interaction(coalesce.na(conditionA, 0),                                        coalesce.na(conditionB, 0)))levels(df\$conditionAB) <- c('A', 'B')``