Changing Factor Levels with Dplyr Mutate

Change the levels of multiple factors that start_with the same pattern in R

Use forcats::fct_inseq:

df <- df %>% 
mutate_all(., as.factor) %>%
mutate(col2 = fct_inseq(df$col2))

Output:

levels(df$col2)
[1] "10.01" "10.02" "10.03" "12.1" "12.2" "12.3" "100.1" "100.2" "100.3"

set factor levels in specific order

Use the levels = argument.

tbl <- tibble(states = c("FL", "NY", "CA", "IN")) %>%
mutate(states_fct = factor(states, levels = c("CA", "IN", "FL", "NY"))

Confused on factor levels and mutating with dplyr

Other folks have already pointed out some issues:

1) ifelse repeats atomic values, which results in "de-factoring":

x <- factor( 1:3 )
# [1] 1 2 3 # Factor
# Levels: 1 2 3

ifelse( is.na(x), x, x ) # Effectively "do nothing"
# [1] 1 2 3 # No longer a factor

2) You defined a factor over numeric values, which coerces them to character. This may be undesirable and lead to unexpected behavior if you later assume that they are still numeric:

levels(factor(1:3))       # Factor defined over numeric values
# [1] "1" "2" "3" # but has character levels

With that said, if your goal is to replace NAs in a factor with another value, then forcats::fct_explicit_na() is the function you're looking for:

mhm <- mtcars2 %>% mutate_if( is.factor, fct_explicit_na, "NO VALUE" )
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NO VALUE 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 NO VALUE 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# ...

mhm$mpg
# [1] NO VALUE NO VALUE 22.8 21.4 18.7 ...
# 26 Levels: 10.4 13.3 14.3 14.7 15 15.2 ... NO VALUE

Conditionally replace levels of factor variable using dplyr

As @camille mentioned, once you have a factor, it's locked in, and if you introduce new "entries", it becomes NA.

For example:

x <- factor(letters[1:3])
x[3] = "d"
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "d") :
invalid factor level, NA generated
x
[1] a b <NA>
Levels: a b c

The only way to get out of this, is to convert it to character first and replace:

newdata <- dat %>% mutate(newanimal=replace(as.character(animal), animal=='cat' & size=='big', "fatcat"))
newdata
animal size newanimal
1 cat big fatcat
2 cat big fatcat
3 dog big dog
4 cat small cat

Your new column is a character now, but you can always convert it back to a factor, if you need that..

str(newdata)
'data.frame': 4 obs. of 3 variables:
$ animal : Factor w/ 2 levels "cat","dog": 1 1 2 1
$ size : Factor w/ 2 levels "big","small": 1 1 1 2
$ newanimal: chr "fatcat" "fatcat" "dog" "cat"

dplyr: mutate a factor column with 3 levels to 3 logical columns with TRUE and FALSE

Use any of these:

iris %>% cbind(sapply(levels(.$Species), `==`, .$Species))

iris %>% cbind(model.matrix(~ Species + 0, .) == 1)

iris %>% cbind(outer(.$Species, setNames(levels(.$Species), levels(.$Species)), "=="))

expand_factor <- function(f) {
m <- matrix(0, length(f), nlevels(f), dimnames = list(NULL, levels(f)))
replace(m, cbind(seq_along(f), f), 1)
}
iris %>% cbind(expand_factor(.$Species) == 1)

library(nnet)
iris %>% cbind(class.ind(.$Species) == 1)

R: Manually Specifying Factor Levels

Is this helpful?

library(dplyr)

problem_data %>%
group_by(types) %>%
count(dates)
#> # A tibble: 25 × 3
#> # Groups: types [5]
#> types dates n
#> <fct> <fct> <int>
#> 1 A 2010-01 188
#> 2 A 2010-02 77
#> 3 A 2010-03 35
#> 4 A 2010-04 32
#> 5 A 2010-05 31
#> 6 B 2010-01 137
#> 7 B 2010-02 64
#> 8 B 2010-03 27
#> 9 B 2010-04 28
#> 10 B 2010-05 20
#> # … with 15 more rows

Created on 2022-01-23 by the reprex package (v2.0.1)

data:

set.seed(111)
v1 <- c("2010-01", "2010-02", "2010-03", "2010-04", "2010-05")
v2 <- c("A", "B", "C", "D", "E")
dates <- as.factor(sample(v1, 1000, replace = TRUE, prob = c(0.5, 0.2, 0.1, 0.1, 0.1)))
types <- as.factor(sample(v2, 1000, replace = TRUE, prob = c(0.3, 0.2, 0.1, 0.1, 0.1)))
var <- rnorm(1000, 10, 10)
problem_data <- data.frame(var, dates, types)

Creating a new factor variable based off the levels of an old one in dplyr

Agree with @nyk, don't convert payfreq to factor initially. Perform the transformation that you want to do and finally convert both final_payfreq and payfreq to factor.

library(dplyr)

wage_data %>%
mutate(imp_payfreq = case_when(
between(days, 6, 8) ~ "Weekly",
between(days, 28, 32) ~ "Monthly",
between(days, 362, 368) ~ "Annual",
TRUE ~ NA_character_),
final_payfreq=ifelse(is.na(payfreq), imp_payfreq, payfreq),
across(c(final_payfreq, payfreq), factor, c('Weekly', 'Monthly','Annual'))) -> result

result

To correct OP's code we can do :

wage_data %>%
mutate(final_payfreq = ifelse(is.na(payfreq),
factor(imp_payfreq, levels(payfreq)), payfreq)


Related Topics



Leave a reply



Submit