Convert particular numeric/int into factors in R using a colname vector
Also, you can use across()
with mutate()
using the index position of each variable. If you have 100 variables it is easy to use index number than names:
library(dplyr)
#Data
df <- data.frame(id = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
customer = c("Alice", "Bob", "Carlos", "Chuck", "Craig", "Heidi", "Judy", "Rupert", "Wendy"),
Balance = c(100, 75, 56, 172, 450, 777, 1001, 25, 968),
Hour = c(1, 23, 4, 5, 6, 12, 14, 17, 17),
InDebt = c(1, 1, 1, 1, 0, 0, 0, 1, 1),
DueDay = c("Mon", "Tue", "Wed", "Fri", "Sun", "Sat", "Thu", "Mon", "Wed"),
AppBooked = c(1, 1, 1, 0, 0, 1, 0, 1, 1),stringsAsFactors = F
)
#Code
df <- df %>% mutate(across(c(4,5,7),factor))
Output:
'data.frame': 9 obs. of 7 variables:
$ id : num 1 2 3 4 5 6 7 8 9
$ customer : chr "Alice" "Bob" "Carlos" "Chuck" ...
$ Balance : num 100 75 56 172 450 ...
$ Hour : Factor w/ 8 levels "1","4","5","6",..: 1 8 2 3 4 5 6 7 7
$ InDebt : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 2
$ DueDay : chr "Mon" "Tue" "Wed" "Fri" ...
$ AppBooked: Factor w/ 2 levels "0","1": 2 2 2 1 1 2 1 2 2
Or using your variable vector colsasfactors
:
#Code 2
df <- df %>% mutate(across(colsasfactors,factor))
Output:
'data.frame': 9 obs. of 7 variables:
$ id : num 1 2 3 4 5 6 7 8 9
$ customer : chr "Alice" "Bob" "Carlos" "Chuck" ...
$ Balance : num 100 75 56 172 450 ...
$ Hour : Factor w/ 8 levels "1","4","5","6",..: 1 8 2 3 4 5 6 7 7
$ InDebt : Factor w/ 2 levels "0","1": 2 2 2 2 1 1 1 2 2
$ DueDay : chr "Mon" "Tue" "Wed" "Fri" ...
$ AppBooked: Factor w/ 2 levels "0","1": 2 2 2 1 1 2 1 2 2
Converting multiple columns to factors and releveling with mutate(across)
You can do across
as an anonymous function like this:
dat <- data.frame(Comp1Letter = c("A", "B", "D", "F", "U", "A*", "B", "C"),
Comp2Letter = c("B", "C", "E", "U", "A", "C", "A*", "E"),
Comp3Letter = c("D", "A", "C", "D", "F", "D", "C", "A"))
GradeLevels <- c("A*", "A", "B", "C", "D", "E", "F", "G", "U")
dat %>%
tibble::as_tibble() %>%
dplyr::mutate(dplyr::across(c(Comp1Letter, Comp2Letter, Comp3Letter) , ~forcats::parse_factor(., levels = GradeLevels)))
# # A tibble: 8 × 3
# Comp1Letter Comp2Letter Comp3Letter
# <fct> <fct> <fct>
# 1 A B D
# 2 B C A
# 3 D E C
# 4 F U D
# 5 U A F
# 6 A* C D
# 7 B A* C
# 8 C E A
You were close, all that was left to be done was make the factor function anonymous. That can be done either with ~
and .
in tidyverse
or function(x)
and x
in base R.
How to elegantly recode multiple columns containing multiple values
Try this. Just take into account that we are using mutate()
and across()
twice in order to first transform values to factor ordered by how they appear in each variable (unique()
), and then the numeric side with as.numeric()
to extract the values. Here the code:
library(tidyverse)
#Code
df %>% mutate(across(gender:smoke,~factor(.,levels = unique(.)))) %>%
mutate(across(gender:smoke,~as.numeric(.)))
Output:
gender education smoke
1 1 1 1
2 2 2 2
3 3 3 3
And in order to identify how the new values will be assigned you can use this:
#Code 2
df %>% summarise_all(.funs = unique) %>% pivot_longer(everything()) %>%
arrange(name) %>%
group_by(name) %>% mutate(Newval=1:n())
Output:
# A tibble: 9 x 3
# Groups: name [3]
name value Newval
<chr> <fct> <int>
1 education high-school 1
2 education grad-school 2
3 education home-school 3
4 gender male 1
5 gender female 2
6 gender transgender 3
7 smoke yes 1
8 smoke no 2
9 smoke prefer not tell 3
Or maybe for more control:
#Code 3
df %>% mutate(id=1:n()) %>% pivot_longer(-id) %>%
left_join(df %>% summarise_all(.funs = unique) %>% pivot_longer(everything()) %>%
arrange(name) %>%
group_by(name) %>% mutate(Newval=1:n()) %>% ungroup()) %>%
select(-value) %>%
pivot_wider(names_from = name,values_from=Newval) %>%
select(-id)
Output:
# A tibble: 3 x 3
gender education smoke
<int> <int> <int>
1 1 1 1
2 2 2 2
3 3 3 3
In case your variables are of class character
you can use this pipeline to transform from character to factor, then re organize the factor and then make them numeric:
#Code 4
df %>%
mutate(across(gender:smoke,~as.factor(.))) %>%
mutate(across(gender:smoke,~factor(.,levels = unique(.)))) %>%
mutate(across(gender:smoke,~as.numeric(.)))
Output:
gender education smoke
1 1 1 1
2 2 2 2
3 3 3 3
change a numeric column to a factor and assign labels/levels to the data
You may try using levels
function. For example dummy
data with three factor 1, 2 and 3,
dummy <- data.frame(
fac = rep(c(1,2,3),4)
)
dummy$fac <- as.factor(dummy$fac)
In base R
R
-1
levels(dummy$fac) <- c("Petrol", "Hybrid", "Disesel")
R
-2
levels(dummy$fac) <- list("Petrol" = 1, "Hybrid" = 2, "Disesel" = 3)
Also, using dplyr
package,
dplyr
dummy$fac <- dplyr::recode_factor(dummy$fac, "1" = "Petrol", "2" = "Hybrid" , "3" = "Disesel")
All will give
fac
1 Petrol
2 Hybrid
3 Disesel
4 Petrol
5 Hybrid
6 Disesel
7 Petrol
8 Hybrid
9 Disesel
10 Petrol
11 Hybrid
12 Disesel
And str(dummy$fac)
is like
Factor w/ 3 levels "Petrol","Hybrid",..: 1 2 3 1 2 3 1 2 3 1 ...
Convert Multiple Column Classes
We can use mapply
and provide the functions as a list to convert the columns.
df <- as.data.frame(matrix(1:20, 5, 4))
df[] <- mapply(function(x, FUN) FUN(x),
df,
list(as.integer, as.numeric, as.character, as.factor),
SIMPLIFY = FALSE)
str(df)
# 'data.frame': 5 obs. of 4 variables:
# $ V1: int 1 2 3 4 5
# $ V2: num 6 7 8 9 10
# $ V3: chr "11" "12" "13" "14" ...
# $ V4: Factor w/ 5 levels "16","17","18",..: 1 2 3 4 5
Related Topics
How to Find Changing Points in a Dataset
Line Spacing for Wrapped Text in Ggplot
Add Points to Usmap with Ggplot in R
Convert Unicode to Readable Characters in R
R: As.Posixct Timezone and Scale_X_Datetime Issues in My Dataset
Find Match of Two Data Frames and Rewrite The Answer as Data Frame
Download File from Internet via R Despite The Popup
Benchmarking: Using 'Expression' 'Quote' or Neither
How to Do Histograms of This Row-Column Table in R Ggplot
How to Keep Track of Total Transaction Amount Sent from an Account Each Last 6 Month
R: Xmleventparse with Large, Varying-Node Xml Input and Conversion to Data Frame
Ggplot2: Shape, Color and Linestyle into One Legend
How to Place +/- Plus Minus Operator in Text Annotation of Plot (Ggplot2)