Change the levels of multiple factors that start_with the same pattern in R
Use forcats::fct_inseq
:
df <- df %>%
mutate_all(., as.factor) %>%
mutate(col2 = fct_inseq(df$col2))
Output:
levels(df$col2)
[1] "10.01" "10.02" "10.03" "12.1" "12.2" "12.3" "100.1" "100.2" "100.3"
set factor levels in specific order
Use the levels =
argument.
tbl <- tibble(states = c("FL", "NY", "CA", "IN")) %>%
mutate(states_fct = factor(states, levels = c("CA", "IN", "FL", "NY"))
Confused on factor levels and mutating with dplyr
Other folks have already pointed out some issues:
1) ifelse
repeats atomic values, which results in "de-factoring":
x <- factor( 1:3 )
# [1] 1 2 3 # Factor
# Levels: 1 2 3
ifelse( is.na(x), x, x ) # Effectively "do nothing"
# [1] 1 2 3 # No longer a factor
2) You defined a factor over numeric values, which coerces them to character. This may be undesirable and lead to unexpected behavior if you later assume that they are still numeric:
levels(factor(1:3)) # Factor defined over numeric values
# [1] "1" "2" "3" # but has character levels
With that said, if your goal is to replace NAs in a factor with another value, then forcats::fct_explicit_na()
is the function you're looking for:
mhm <- mtcars2 %>% mutate_if( is.factor, fct_explicit_na, "NO VALUE" )
# mpg cyl disp hp drat wt qsec vs am gear carb
# 1 NO VALUE 6 160.0 110 3.90 2.620 16.46 0 1 4 4
# 2 NO VALUE 6 160.0 110 3.90 2.875 17.02 0 1 4 4
# 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
# ...
mhm$mpg
# [1] NO VALUE NO VALUE 22.8 21.4 18.7 ...
# 26 Levels: 10.4 13.3 14.3 14.7 15 15.2 ... NO VALUE
Conditionally replace levels of factor variable using dplyr
As @camille mentioned, once you have a factor, it's locked in, and if you introduce new "entries", it becomes NA.
For example:
x <- factor(letters[1:3])
x[3] = "d"
Warning message:
In `[<-.factor`(`*tmp*`, 3, value = "d") :
invalid factor level, NA generated
x
[1] a b <NA>
Levels: a b c
The only way to get out of this, is to convert it to character first and replace:
newdata <- dat %>% mutate(newanimal=replace(as.character(animal), animal=='cat' & size=='big', "fatcat"))
newdata
animal size newanimal
1 cat big fatcat
2 cat big fatcat
3 dog big dog
4 cat small cat
Your new column is a character now, but you can always convert it back to a factor, if you need that..
str(newdata)
'data.frame': 4 obs. of 3 variables:
$ animal : Factor w/ 2 levels "cat","dog": 1 1 2 1
$ size : Factor w/ 2 levels "big","small": 1 1 1 2
$ newanimal: chr "fatcat" "fatcat" "dog" "cat"
dplyr: mutate a factor column with 3 levels to 3 logical columns with TRUE and FALSE
Use any of these:
iris %>% cbind(sapply(levels(.$Species), `==`, .$Species))
iris %>% cbind(model.matrix(~ Species + 0, .) == 1)
iris %>% cbind(outer(.$Species, setNames(levels(.$Species), levels(.$Species)), "=="))
expand_factor <- function(f) {
m <- matrix(0, length(f), nlevels(f), dimnames = list(NULL, levels(f)))
replace(m, cbind(seq_along(f), f), 1)
}
iris %>% cbind(expand_factor(.$Species) == 1)
library(nnet)
iris %>% cbind(class.ind(.$Species) == 1)
R: Manually Specifying Factor Levels
Is this helpful?
library(dplyr)
problem_data %>%
group_by(types) %>%
count(dates)
#> # A tibble: 25 × 3
#> # Groups: types [5]
#> types dates n
#> <fct> <fct> <int>
#> 1 A 2010-01 188
#> 2 A 2010-02 77
#> 3 A 2010-03 35
#> 4 A 2010-04 32
#> 5 A 2010-05 31
#> 6 B 2010-01 137
#> 7 B 2010-02 64
#> 8 B 2010-03 27
#> 9 B 2010-04 28
#> 10 B 2010-05 20
#> # … with 15 more rows
Created on 2022-01-23 by the reprex package (v2.0.1)
data:
set.seed(111)
v1 <- c("2010-01", "2010-02", "2010-03", "2010-04", "2010-05")
v2 <- c("A", "B", "C", "D", "E")
dates <- as.factor(sample(v1, 1000, replace = TRUE, prob = c(0.5, 0.2, 0.1, 0.1, 0.1)))
types <- as.factor(sample(v2, 1000, replace = TRUE, prob = c(0.3, 0.2, 0.1, 0.1, 0.1)))
var <- rnorm(1000, 10, 10)
problem_data <- data.frame(var, dates, types)
Creating a new factor variable based off the levels of an old one in dplyr
Agree with @nyk, don't convert payfreq
to factor
initially. Perform the transformation that you want to do and finally convert both final_payfreq
and payfreq
to factor.
library(dplyr)
wage_data %>%
mutate(imp_payfreq = case_when(
between(days, 6, 8) ~ "Weekly",
between(days, 28, 32) ~ "Monthly",
between(days, 362, 368) ~ "Annual",
TRUE ~ NA_character_),
final_payfreq=ifelse(is.na(payfreq), imp_payfreq, payfreq),
across(c(final_payfreq, payfreq), factor, c('Weekly', 'Monthly','Annual'))) -> result
result
To correct OP's code we can do :
wage_data %>%
mutate(final_payfreq = ifelse(is.na(payfreq),
factor(imp_payfreq, levels(payfreq)), payfreq)
Related Topics
How to Set Fixed Continuous Colour Values in Ggplot2
How to Determine If You Have an Internet Connection in R
Reason Behind Speed of Fread in Data.Table Package in R
How to Implement a Cleanup Routine in R Shiny
Ggplot2 Multiple Scales/Legends Per Aesthetic, Revisited
How to Draw a Nice Arrow in Ggplot2
How to Pass Dynamic Column Names in Dplyr into Custom Function
Import Data into R with an Unknown Number of Columns
How to Change Order of Array Dimensions
Delete a Column in a Data Frame Within a List
Dplyr - Using Mutate() Like Rowmeans()
How to Do Range Grouping on a Column Using Dplyr
Plot a Line Chart with Conditional Colors Depending on Values
Overlay Two Ggplot2 Stat_Density2D Plots with Alpha Channels
Converting a \U Escaped Unicode String to Ascii
Specifying Formula in R with Glm Without Explicit Declaration of Each Covariate