Applying the same factor levels to multiple variables in an R data frame
df[] <- lapply(df, factor,
levels=c(-9, 0, 1),
labels = c("Unknown or Missing", "No", "Yes"))
str(df)
Likely to be faster than apply or sapply which need data.frame to reform/reclass those results. The trick here is that using []
on the LHS of the assignment preserves the structure of the target (because R "knows" what its class and dimensions are, and the need for data.frame
on the list from lapply
is not needed. If you had wanted to do this only with selected columns you could do this:
df[colnums] <- lapply(df[colnums], factor,
levels=c(-9, 0, 1),
labels = c("Unknown or Missing", "No", "Yes"))
str(df)
Applying same factor levels to multiple variables with differing amount of levels in R
Here is a way with set()
called in a for
loop.
library(data.table)
f <- function(x){
x <- as.character(x)
i1 <- x %in% c("TRUE", "1")
i0 <- x %in% c("FALSE", "0")
x[which(i1)] <- "2"
x[which(i0)] <- "1"
as.integer(x)
}
for (j in seq_along(dt)) set(dt, j = j, value = f(dt[[j]]))
dt
# region1 region2 region3 region4
#1: 2 2 NA NA
#2: 1 2 1 1
#3: 1 1 2 1
#4: 2 NA NA NA
#5: NA NA 1 1
Thanks to jangorecki's comment a much simpler way is
dt[, names(dt) := lapply(dt, f)]
Recoding factor levels and labels (multiple variables at once) in R
If you have data similar to this :
df <- data.frame(id = 1:3, A_L = c('1 = yes', '2 = no', '1 = yes'),
B_L = c('2 = no', '1 = yes', '1 = yes'))
You could use mutate_at
to apply function to multiple columns and recode
to change values.
library(dplyr)
df %>%
mutate_at(vars(contains('_L')),
~recode(., '1 = yes' = '0 = truth', '2 = no' = '1 = lie'))
# id A_L B_L
#1 1 0 = truth 1 = lie
#2 2 1 = lie 0 = truth
#3 3 0 = truth 0 = truth
Or in base R :
cols <- grep('_L', names(df))
df[cols] <- lapply(df[cols], function(x)
ifelse(x == '1 = yes', '0 = truth', '1 = lie'))
Change the levels of multiple factors that start_with the same pattern in R
Use forcats::fct_inseq
:
df <- df %>%
mutate_all(., as.factor) %>%
mutate(col2 = fct_inseq(df$col2))
Output:
levels(df$col2)
[1] "10.01" "10.02" "10.03" "12.1" "12.2" "12.3" "100.1" "100.2" "100.3"
How to apply the same factor levels to 2 variables containing similar data
You could do:
df[] <- lapply(df, function(x) factor(x, levels = levels(vec_fac)))
Out:
> str(df)
'data.frame': 15 obs. of 2 variables:
$ group1: Factor w/ 6 levels "A","B","C","D",..: 2 3 4 5 6 3 4 5 6 4 ...
$ group2: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 2 2 2 2 3 ..
Creating a new factor variable from multiple factor variables, all with same levels
In dplyr
you can specify the conditions in case_when
:
library(dplyr)
df %>%
rowwise() %>%
mutate(result = {
vec <- c_across(f1:f3)
case_when(sum(vec %in% 1:2) >= 2 ~ 1,
sum(vec == 3) >= 2 ~ 2,
sum(vec == 4) >= 2 ~ 3,
TRUE ~ 4)
})
# id f1 f2 f3 result
# <int> <fct> <fct> <fct> <dbl>
# 1 1 4 2 1 1
# 2 2 1 1 1 1
# 3 3 4 2 2 1
# 4 4 4 3 1 4
# 5 5 2 2 1 1
# 6 6 3 4 2 4
# 7 7 4 2 4 3
# 8 8 3 2 2 1
# 9 9 3 1 1 1
#10 10 2 1 1 1
How to create a factor variable based on the levels of the same variable in a different data frame
One option is to bind the datasets with bind_rows
, while creating a data identifier ('grp'), convert the character
columns to factor
, do a group_split
by the 'grp' into a list
of data.frames, then set the names of the list
with setNames
and update the original objects with list2env
library(dplyr)
bind_rows(main_df, addl_df, .id = 'grp') %>%
mutate(across(where(is.character), factor)) %>%
group_split(grp, .keep = FALSE) %>%
setNames(c('main_df', 'addl_df')) %>%
list2env(.GlobalEnv)
-output
> str(main_df)
tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:5] 1 2 3 4 5
$ age : num [1:5] 10 20 30 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1 2 2 1
$ city : Factor w/ 4 levels "A","B","C","D": 1 2 3 4 4
> str(addl_df)
tibble [2 × 4] (S3: tbl_df/tbl/data.frame)
$ id : num [1:2] 7 8
$ age : num [1:2] 40 45
$ gender: Factor w/ 2 levels "F","M": 1 1
$ city : Factor w/ 4 levels "A","B","C","D": 3 4
Is there a function in R to change several similar factor levels at once?
If the vectors are of same length you can put them in dataframe or if they are of different length put them in a list and then use lapply
to apply the same function for all of them. You can use forcats::fct_collapse
to collapse multiple levels into one.
list_vec <- list(A, B, C)
list_vec <- lapply(list_vec, function(x) forcats::fct_collapse(x,
"yes"=c("Likely", "y", "Y", "Yes", "yes"),
"no" = c("", "No", "UK", "no", "N", "n", "uk")))
Related Topics
In Ggplot2, How to Add Additional Legend
How to Do Conditional Grouping of Data in R
Do I Need to Normalize (Or Scale) Data for Randomforest (R Package)
How to Run Lm Regression for Every Column in R
Ggplot: How to Increase Spacing Between Faceted Plots
Grouping & Visualizing Cumulative Features in R
Coding Practice in R:What Are the Advantages and Disadvantages of Different Styles
Recommended Package for Very Large Dataset Processing and MAChine Learning in R
How to Create, Structure, Maintain and Update Data Codebooks in R
Loop Character Values in Ggtitle
Error in Fetch(Key):Lazy-Load Database
Create Unique Identifier from the Interchangeable Combination of Two Variables
Replace Na with Groups Mean in a Non Specified Number of Columns