Applying the Same Factor Levels to Multiple Variables in an R Data Frame

Applying the same factor levels to multiple variables in an R data frame

df[] <- lapply(df, factor, 
levels=c(-9, 0, 1),
labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using [] on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame on the list from lapply is not needed. If you had wanted to do this only with selected columns you could do this:

 df[colnums] <- lapply(df[colnums], factor, 
levels=c(-9, 0, 1),
labels = c("Unknown or Missing", "No", "Yes"))
str(df)

Applying same factor levels to multiple variables with differing amount of levels in R

Here is a way with set() called in a for loop.

library(data.table)

f <- function(x){
x <- as.character(x)
i1 <- x %in% c("TRUE", "1")
i0 <- x %in% c("FALSE", "0")
x[which(i1)] <- "2"
x[which(i0)] <- "1"
as.integer(x)
}

for (j in seq_along(dt)) set(dt, j = j, value = f(dt[[j]]))

dt
# region1 region2 region3 region4
#1: 2 2 NA NA
#2: 1 2 1 1
#3: 1 1 2 1
#4: 2 NA NA NA
#5: NA NA 1 1

Thanks to jangorecki's comment a much simpler way is

dt[, names(dt) := lapply(dt, f)]

Recoding factor levels and labels (multiple variables at once) in R

If you have data similar to this :

df <- data.frame(id = 1:3, A_L = c('1 = yes', '2 = no', '1 = yes'), 
B_L = c('2 = no', '1 = yes', '1 = yes'))

You could use mutate_at to apply function to multiple columns and recode to change values.

library(dplyr)
df %>%
mutate_at(vars(contains('_L')),
~recode(., '1 = yes' = '0 = truth', '2 = no' = '1 = lie'))

# id A_L B_L
#1 1 0 = truth 1 = lie
#2 2 1 = lie 0 = truth
#3 3 0 = truth 0 = truth

Or in base R :

cols <- grep('_L', names(df))
df[cols] <- lapply(df[cols], function(x)
ifelse(x == '1 = yes', '0 = truth', '1 = lie'))

Change the levels of multiple factors that start_with the same pattern in R

Use forcats::fct_inseq:

df <- df %>% 
mutate_all(., as.factor) %>%
mutate(col2 = fct_inseq(df$col2))

Output:

levels(df$col2)
[1] "10.01" "10.02" "10.03" "12.1" "12.2" "12.3" "100.1" "100.2" "100.3"

How to apply the same factor levels to 2 variables containing similar data

You could do:

df[] <- lapply(df, function(x) factor(x, levels = levels(vec_fac)))

Out:

> str(df)
'data.frame': 15 obs. of 2 variables:
$ group1: Factor w/ 6 levels "A","B","C","D",..: 2 3 4 5 6 3 4 5 6 4 ...
$ group2: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 3 ..

Creating a new factor variable from multiple factor variables, all with same levels

In dplyr you can specify the conditions in case_when :

library(dplyr)

df %>%
rowwise() %>%
mutate(result = {
vec <- c_across(f1:f3)
case_when(sum(vec %in% 1:2) >= 2 ~ 1,
sum(vec == 3) >= 2 ~ 2,
sum(vec == 4) >= 2 ~ 3,
TRUE ~ 4)
})

# id f1 f2 f3 result
# <int> <fct> <fct> <fct> <dbl>
# 1 1 4 2 1 1
# 2 2 1 1 1 1
# 3 3 4 2 2 1
# 4 4 4 3 1 4
# 5 5 2 2 1 1
# 6 6 3 4 2 4
# 7 7 4 2 4 3
# 8 8 3 2 2 1
# 9 9 3 1 1 1
#10 10 2 1 1 1

How to create a factor variable based on the levels of the same variable in a different data frame

One option is to bind the datasets with bind_rows, while creating a data identifier ('grp'), convert the character columns to factor, do a group_split by the 'grp' into a list of data.frames, then set the names of the list with setNames and update the original objects with list2env

library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>%
mutate(across(where(is.character), factor)) %>%
group_split(grp, .keep = FALSE) %>%
setNames(c('main_df', 'addl_df')) %>%
list2env(.GlobalEnv)

-output

> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:5] 1 2 3 4 5
$ age : num [1:5] 10 20 30 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
$ city : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:2] 7 8
$ age : num [1:2] 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1
$ city : Factor w/ 4 levels "A","B","C","D": 3 4

Is there a function in R to change several similar factor levels at once?

If the vectors are of same length you can put them in dataframe or if they are of different length put them in a list and then use lapply to apply the same function for all of them. You can use forcats::fct_collapse to collapse multiple levels into one.

list_vec <- list(A, B, C)

list_vec <- lapply(list_vec, function(x) forcats::fct_collapse(x,
"yes"=c("Likely", "y", "Y", "Yes", "yes"),
"no" = c("", "No", "UK", "no", "N", "n", "uk")))


Related Topics



Leave a reply



Submit