Grouping 2 Levels of a Factor in R

Grouping 2 levels of a factor in R

Use levels(x) <- ... to specify new levels, and to combine some previous levels. For example:

f <- factor(LETTERS[c(1:3, 3:1)])
f
[1] A B C C B A
Levels: A B C

Now combine "A" and "B" into a single level:

levels(f) <- c("A", "A", "C")
f
[1] A A C C A A
Levels: A C

How do you group factor levels in R?

Grouping factor levels can easily be done by assigning the grouping in a list. Here an example with toy data:

levels(mydata$value)
# [1] "not likely" "slightly likely" "likely" "very likely"

levels(mydata$value) <- list("unlikely"=c("not likely", "slightly likely"),
"likely"=c("likely", "very likely"))
levels(mydata$value)
# [1] "unlikely" "likely"

After that you probably want to do this:

(Statistical_Testing.aov <- aov(as.integer(value) ~ question, data = mydata))
# Call:
# aov(formula = as.integer(value) ~ question, data = mydata)
#
# Terms:
# question Residuals
# Sum of Squares 0.18 5.82
# Deg. of Freedom 1 23
#
# Residual standard error: 0.5030343
# Estimated effects may be unbalanced

(Statistical_Testing.anova <- anova(Statistical_Testing.aov))
# Analysis of Variance Table
#
# Response: as.integer(value)
# Df Sum Sq Mean Sq F value Pr(>F)
# question 1 0.18 0.18000 0.7113 0.4077
# Residuals 23 5.82 0.25304

Toy data:

set.seed(42)
mydata <- transform(expand.grid(question=1:5, id=1:5),
value=factor(sample(1:4, 25, rep=T),
labels=c("not likely", "slightly likely",
"likely", "very likely")))

How to group by factor levels from two columns and output new column that shows sum of each level in R?

Instead of grouping by 'RawDate', group by 'ID', 'YEAR' and get the sum on a logical vector

library(dplyr)
complete_df %>%
group_by(ID, YEAR) %>%
mutate(TotalWon = sum(Renewal == 'WON'), TotalLost = sum(Renewal == 'LOST'))

If we need a summarised output, use summarise instead of mutate

Factor levels by group

A data.table solution:

dt[, height_cat := cut(Height, breaks = c(0, 165, 180, 300), right = FALSE)]
dt[, height_f :=
factor(
paste(Sex, height_cat, sep = ":"),
levels = dt[, CJ(Sex, height_cat, unique = TRUE)][, paste(Sex, height_cat, sep = ":")]
)]

table(dt$height_f)
# F:[0,165) F:[165,180) F:[180,300) M:[0,165) M:[165,180) M:[180,300)
# 2 2 0 0 2 2

Group by two factors with dplyr

You need to "reshape" or "pivot" the data. Since you're already using dplyr, then you can use tidyr::pivot_wider. (Alternatively, reshape2::dcast will work similarly, though frankly I believe pivot_wider is more feature-full.)

library(dplyr)
test <- df %>%
group_by(factor1, factor2) %>%
summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
values_fill = 0)
# # A tibble: 3 x 4
# # Groups: factor1 [3]
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
# 1 A 57 78 0
# 2 B 0 0 32
# 3 C 0 5 15

Combining factor level in R

One option is recode from car

library(car)
recode(x, "c('A', 'B')='A+B';c('D', 'E') = 'D+E'")
#[1] A+B A+B A+B C D+E D+E A+B D+E C
#Levels: A+B C D+E

It should also work with dplyr

library(dplyr)
df %>%
mutate(x= recode(x, "c('A', 'B')='A+B';c('D', 'E') = 'D+E'"))
# x
#1 A+B
#2 A+B
#3 A+B
#4 C
#5 D+E
#6 D+E
#7 A+B
#8 D+E
#9 C

data

df <- data.frame(x)

Write a function in R to group factor levels by frequency, then keep the 2 largest categories and pool the rest in other

forcats::fct_lump_n() exists for precisely this:

library(forcats)
library(dplyr)

df %>%
mutate_all(fct_lump_n, 2)

var1 var2
1 square orange
2 square orange
3 square orange
4 circle orange
5 square blue
6 square orange
7 circle blue
8 square blue
9 circle orange
10 circle blue
11 circle blue
12 circle blue
13 square orange
14 circle orange
15 Other orange
16 circle orange
17 circle Other
18 Other Other


Related Topics



Leave a reply



Submit