R -Apply- Convert Many Columns from Numeric to Factor

Convert particular numeric/int into factors in R using a colname vector

Also, you can use across() with mutate() using the index position of each variable. If you have 100 variables it is easy to use index number than names:

library(dplyr)
#Data
df <- data.frame(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
customer = c("Alice", "Bob", "Carlos", "Chuck", "Craig", "Heidi", "Judy", "Rupert", "Wendy"),
Balance = c(100, 75, 56, 172, 450, 777, 1001, 25, 968),
Hour = c(1, 23, 4, 5, 6, 12, 14, 17, 17),
InDebt = c(1, 1, 1, 1, 0, 0, 0, 1, 1),
DueDay = c("Mon", "Tue", "Wed", "Fri", "Sun", "Sat", "Thu", "Mon", "Wed"),
AppBooked = c(1, 1, 1, 0, 0, 1, 0, 1, 1),stringsAsFactors = F
)
#Code
df <- df %>% mutate(across(c(4,5,7),factor))

Output:

'data.frame':   9 obs. of  7 variables:
$ id : num 1 2 3 4 5 6 7 8 9
$ customer : chr "Alice" "Bob" "Carlos" "Chuck" ...
$ Balance : num 100 75 56 172 450 ...
$ Hour : Factor w/ 8 levels "1","4","5","6",..: 1 8 2 3 4 5 6 7 7
$ InDebt : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 2
$ DueDay : chr "Mon" "Tue" "Wed" "Fri" ...
$ AppBooked: Factor w/ 2 levels "0","1": 2 2 2 1 1 2 1 2 2

Or using your variable vector colsasfactors:

#Code 2
df <- df %>% mutate(across(colsasfactors,factor))

Output:

'data.frame':   9 obs. of  7 variables:
$ id : num 1 2 3 4 5 6 7 8 9
$ customer : chr "Alice" "Bob" "Carlos" "Chuck" ...
$ Balance : num 100 75 56 172 450 ...
$ Hour : Factor w/ 8 levels "1","4","5","6",..: 1 8 2 3 4 5 6 7 7
$ InDebt : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 2
$ DueDay : chr "Mon" "Tue" "Wed" "Fri" ...
$ AppBooked: Factor w/ 2 levels "0","1": 2 2 2 1 1 2 1 2 2

Converting multiple columns to factors and releveling with mutate(across)

You can do across as an anonymous function like this:

dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))

GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")

dat %>%
tibble::as_tibble() %>%
dplyr::mutate(dplyr::across(c(Comp1Letter, Comp2Letter, Comp3Letter) , ~forcats::parse_factor(., levels = GradeLevels)))

# # A tibble: 8 × 3
# Comp1Letter Comp2Letter Comp3Letter
# <fct> <fct> <fct>
# 1 A B D
# 2 B C A
# 3 D E C
# 4 F U D
# 5 U A F
# 6 A* C D
# 7 B A* C
# 8 C E A

You were close, all that was left to be done was make the factor function anonymous. That can be done either with ~ and . in tidyverse or function(x) and x in base R.

How to elegantly recode multiple columns containing multiple values

Try this. Just take into account that we are using mutate() and across() twice in order to first transform values to factor ordered by how they appear in each variable (unique()), and then the numeric side with as.numeric() to extract the values. Here the code:

library(tidyverse)
#Code
df %>% mutate(across(gender:smoke,~factor(.,levels = unique(.)))) %>%
mutate(across(gender:smoke,~as.numeric(.)))

Output:

  gender education smoke
1 1 1 1
2 2 2 2
3 3 3 3

And in order to identify how the new values will be assigned you can use this:

#Code 2
df %>% summarise_all(.funs = unique) %>% pivot_longer(everything()) %>%
arrange(name) %>%
group_by(name) %>% mutate(Newval=1:n())

Output:

# A tibble: 9 x 3
# Groups: name [3]
name value Newval
<chr> <fct> <int>
1 education high-school 1
2 education grad-school 2
3 education home-school 3
4 gender male 1
5 gender female 2
6 gender transgender 3
7 smoke yes 1
8 smoke no 2
9 smoke prefer not tell 3

Or maybe for more control:

#Code 3
df %>% mutate(id=1:n()) %>% pivot_longer(-id) %>%
left_join(df %>% summarise_all(.funs = unique) %>% pivot_longer(everything()) %>%
arrange(name) %>%
group_by(name) %>% mutate(Newval=1:n()) %>% ungroup()) %>%
select(-value) %>%
pivot_wider(names_from = name,values_from=Newval) %>%
select(-id)

Output:

# A tibble: 3 x 3
gender education smoke
<int> <int> <int>
1 1 1 1
2 2 2 2
3 3 3 3

In case your variables are of class character you can use this pipeline to transform from character to factor, then re organize the factor and then make them numeric:

#Code 4
df %>%
mutate(across(gender:smoke,~as.factor(.))) %>%
mutate(across(gender:smoke,~factor(.,levels = unique(.)))) %>%
mutate(across(gender:smoke,~as.numeric(.)))

Output:

  gender education smoke
1 1 1 1
2 2 2 2
3 3 3 3

change a numeric column to a factor and assign labels/levels to the data

You may try using levels function. For example dummy data with three factor 1, 2 and 3,

dummy <- data.frame(
fac = rep(c(1,2,3),4)
)
dummy$fac <- as.factor(dummy$fac)

In base R

R-1

levels(dummy$fac) <- c("Petrol", "Hybrid", "Disesel")

R-2

levels(dummy$fac) <- list("Petrol" = 1, "Hybrid" = 2, "Disesel" = 3)

Also, using dplyr package,

dplyr

dummy$fac <- dplyr::recode_factor(dummy$fac, "1" = "Petrol", "2" = "Hybrid" , "3" = "Disesel")

All will give

       fac
1 Petrol
2 Hybrid
3 Disesel
4 Petrol
5 Hybrid
6 Disesel
7 Petrol
8 Hybrid
9 Disesel
10 Petrol
11 Hybrid
12 Disesel

And str(dummy$fac) is like

Factor w/ 3 levels "Petrol","Hybrid",..: 1 2 3 1 2 3 1 2 3 1 ...

Convert Multiple Column Classes

We can use mapply and provide the functions as a list to convert the columns.

df <- as.data.frame(matrix(1:20, 5, 4))

df[] <- mapply(function(x, FUN) FUN(x),
df,
list(as.integer, as.numeric, as.character, as.factor),
SIMPLIFY = FALSE)
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5


Related Topics



Leave a reply



Submit