Changing Factor Levels with Dplyr Mutate

Change the levels of multiple factors that start_with the same pattern in R

Use forcats::fct_inseq:

df <- df %>% 
  mutate_all(., as.factor) %>% 
  mutate(col2 = fct_inseq(df$col2))

Output:

levels(df$col2)
[1] "10.01" "10.02" "10.03" "12.1"  "12.2"  "12.3"  "100.1" "100.2" "100.3"

set factor levels in specific order

Use the levels = argument.

tbl <- tibble(states = c("FL", "NY", "CA", "IN")) %>%
  mutate(states_fct = factor(states, levels = c("CA", "IN", "FL", "NY"))

Confused on factor levels and mutating with dplyr

Other folks have already pointed out some issues:

1) ifelse repeats atomic values, which results in "de-factoring":

x <- factor( 1:3 )
# [1] 1 2 3               # Factor
# Levels: 1 2 3

ifelse( is.na(x), x, x )  # Effectively "do nothing"
# [1] 1 2 3               # No longer a factor

2) You defined a factor over numeric values, which coerces them to character. This may be undesirable and lead to unexpected behavior if you later assume that they are still numeric:

levels(factor(1:3))       # Factor defined over numeric values
# [1] "1" "2" "3"         #  but has character levels

With that said, if your goal is to replace NAs in a factor with another value, then forcats::fct_explicit_na() is the function you're looking for:

mhm <- mtcars2 %>% mutate_if( is.factor, fct_explicit_na, "NO VALUE" )
#         mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# 1  NO VALUE   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# 2  NO VALUE   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# 3      22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
# ...

mhm$mpg
# [1] NO VALUE NO VALUE 22.8     21.4     18.7    ...
# 26 Levels: 10.4 13.3 14.3 14.7 15 15.2 ... NO VALUE

Conditionally replace levels of factor variable using dplyr

As @camille mentioned, once you have a factor, it's locked in, and if you introduce new "entries", it becomes NA.

For example:

x <- factor(letters[1:3])
x[3] = "d"
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "d") :
  invalid factor level, NA generated
x
[1] a    b    <NA>
Levels: a b c

The only way to get out of this, is to convert it to character first and replace:

newdata <- dat %>% mutate(newanimal=replace(as.character(animal), animal=='cat' & size=='big', "fatcat"))
newdata
  animal  size newanimal
1    cat   big    fatcat
2    cat   big    fatcat
3    dog   big       dog
4    cat small       cat

Your new column is a character now, but you can always convert it back to a factor, if you need that..

str(newdata)
'data.frame':   4 obs. of  3 variables:
 $ animal   : Factor w/ 2 levels "cat","dog": 1 1 2 1
 $ size     : Factor w/ 2 levels "big","small": 1 1 1 2
 $ newanimal: chr  "fatcat" "fatcat" "dog" "cat"

dplyr: mutate a factor column with 3 levels to 3 logical columns with TRUE and FALSE

Use any of these:

iris %>% cbind(sapply(levels(.$Species), `==`, .$Species))

iris %>% cbind(model.matrix(~ Species + 0, .) == 1)

iris %>% cbind(outer(.$Species, setNames(levels(.$Species), levels(.$Species)), "=="))

expand_factor <- function(f) {
  m <- matrix(0, length(f), nlevels(f), dimnames = list(NULL, levels(f)))
  replace(m, cbind(seq_along(f), f), 1)
}
iris %>% cbind(expand_factor(.$Species) == 1)

library(nnet)
iris %>% cbind(class.ind(.$Species) == 1)

R: Manually Specifying Factor Levels

Is this helpful?

library(dplyr)

problem_data %>%
  group_by(types) %>%
  count(dates)
#> # A tibble: 25 × 3
#> # Groups:   types [5]
#>    types dates       n
#>    <fct> <fct>   <int>
#>  1 A     2010-01   188
#>  2 A     2010-02    77
#>  3 A     2010-03    35
#>  4 A     2010-04    32
#>  5 A     2010-05    31
#>  6 B     2010-01   137
#>  7 B     2010-02    64
#>  8 B     2010-03    27
#>  9 B     2010-04    28
#> 10 B     2010-05    20
#> # … with 15 more rows

^{Created on 2022-01-23 by the reprex package (v2.0.1)}

data:

set.seed(111)
v1 <- c("2010-01", "2010-02", "2010-03", "2010-04", "2010-05")
v2 <- c("A", "B", "C", "D", "E")
dates <- as.factor(sample(v1, 1000, replace = TRUE, prob = c(0.5, 0.2, 0.1, 0.1, 0.1)))
types <- as.factor(sample(v2, 1000, replace = TRUE, prob = c(0.3, 0.2, 0.1, 0.1, 0.1)))
var <- rnorm(1000, 10, 10)
problem_data <- data.frame(var, dates, types)

Creating a new factor variable based off the levels of an old one in dplyr

Agree with @nyk, don't convert payfreq to factor initially. Perform the transformation that you want to do and finally convert both final_payfreq and payfreq to factor.

library(dplyr)

wage_data %>%
  mutate(imp_payfreq = case_when(
    between(days, 6, 8) ~ "Weekly",
    between(days, 28, 32) ~ "Monthly",
    between(days, 362, 368) ~ "Annual",
    TRUE ~ NA_character_), 
    final_payfreq=ifelse(is.na(payfreq), imp_payfreq, payfreq), 
    across(c(final_payfreq, payfreq), factor, c('Weekly', 'Monthly','Annual'))) -> result

result

To correct OP's code we can do :

wage_data %>%
  mutate(final_payfreq = ifelse(is.na(payfreq), 
                         factor(imp_payfreq, levels(payfreq)), payfreq)