Grouping 2 levels of a factor in R
Use levels(x) <- ...
to specify new levels, and to combine some previous levels. For example:
f <- factor(LETTERS[c(1:3, 3:1)])
f
[1] A B C C B A
Levels: A B C
Now combine "A" and "B" into a single level:
levels(f) <- c("A", "A", "C")
f
[1] A A C C A A
Levels: A C
How do you group factor levels in R?
Grouping factor levels can easily be done by assigning the grouping in a list. Here an example with toy data:
levels(mydata$value)
# [1] "not likely" "slightly likely" "likely" "very likely"
levels(mydata$value) <- list("unlikely"=c("not likely", "slightly likely"),
"likely"=c("likely", "very likely"))
levels(mydata$value)
# [1] "unlikely" "likely"
After that you probably want to do this:
(Statistical_Testing.aov <- aov(as.integer(value) ~ question, data = mydata))
# Call:
# aov(formula = as.integer(value) ~ question, data = mydata)
#
# Terms:
# question Residuals
# Sum of Squares 0.18 5.82
# Deg. of Freedom 1 23
#
# Residual standard error: 0.5030343
# Estimated effects may be unbalanced
(Statistical_Testing.anova <- anova(Statistical_Testing.aov))
# Analysis of Variance Table
#
# Response: as.integer(value)
# Df Sum Sq Mean Sq F value Pr(>F)
# question 1 0.18 0.18000 0.7113 0.4077
# Residuals 23 5.82 0.25304
Toy data:
set.seed(42)
mydata <- transform(expand.grid(question=1:5, id=1:5),
value=factor(sample(1:4, 25, rep=T),
labels=c("not likely", "slightly likely",
"likely", "very likely")))
How to group by factor levels from two columns and output new column that shows sum of each level in R?
Instead of grouping by 'RawDate', group by 'ID', 'YEAR' and get the sum
on a logical vector
library(dplyr)
complete_df %>%
group_by(ID, YEAR) %>%
mutate(TotalWon = sum(Renewal == 'WON'), TotalLost = sum(Renewal == 'LOST'))
If we need a summarised output, use summarise
instead of mutate
Factor levels by group
A data.table
solution:
dt[, height_cat := cut(Height, breaks = c(0, 165, 180, 300), right = FALSE)]
dt[, height_f :=
factor(
paste(Sex, height_cat, sep = ":"),
levels = dt[, CJ(Sex, height_cat, unique = TRUE)][, paste(Sex, height_cat, sep = ":")]
)]
table(dt$height_f)
# F:[0,165) F:[165,180) F:[180,300) M:[0,165) M:[165,180) M:[180,300)
# 2 2 0 0 2 2
Group by two factors with dplyr
You need to "reshape" or "pivot" the data. Since you're already using dplyr
, then you can use tidyr::pivot_wider
. (Alternatively, reshape2::dcast
will work similarly, though frankly I believe pivot_wider
is more feature-full.)
library(dplyr)
test <- df %>%
group_by(factor1, factor2) %>%
summarise(z = sum(values))
tidyr::pivot_wider(test, factor1, names_from = "factor2", values_from = "z",
values_fill = 0)
# # A tibble: 3 x 4
# # Groups: factor1 [3]
# factor1 `1` `3` `2`
# <chr> <dbl> <dbl> <dbl>
# 1 A 57 78 0
# 2 B 0 0 32
# 3 C 0 5 15
Combining factor level in R
One option is recode
from car
library(car)
recode(x, "c('A', 'B')='A+B';c('D', 'E') = 'D+E'")
#[1] A+B A+B A+B C D+E D+E A+B D+E C
#Levels: A+B C D+E
It should also work with dplyr
library(dplyr)
df %>%
mutate(x= recode(x, "c('A', 'B')='A+B';c('D', 'E') = 'D+E'"))
# x
#1 A+B
#2 A+B
#3 A+B
#4 C
#5 D+E
#6 D+E
#7 A+B
#8 D+E
#9 C
data
df <- data.frame(x)
Write a function in R to group factor levels by frequency, then keep the 2 largest categories and pool the rest in other
forcats::fct_lump_n()
exists for precisely this:
library(forcats)
library(dplyr)
df %>%
mutate_all(fct_lump_n, 2)
var1 var2
1 square orange
2 square orange
3 square orange
4 circle orange
5 square blue
6 square orange
7 circle blue
8 square blue
9 circle orange
10 circle blue
11 circle blue
12 circle blue
13 square orange
14 circle orange
15 Other orange
16 circle orange
17 circle Other
18 Other Other
Related Topics
What Does the Function Invisible() Do
How to Read the Header But Also Skip Lines - Read.Table()
How to Facet a Plot_Ly() Chart
Plot Data Over Background Image with Ggplot
Convert Factor to Date/Time in R
Combining S4 and S3 Methods in a Single Function
Edit Datatable in Shiny with Dropdown Selection for Factor Variables
Optimized Rolling Functions on Irregular Time Series with Time-Based Window
Equivalent to Rowmeans() for Min()
How Achieve Identical Facet Sizes and Scales in Several Multi-Facet Ggplot2 Graphics
Make a Rectangular Legend, with Rows and Columns Labeled, in Grid
Read CSV File in R with Currency Column as Numeric
How to Append a Whole Dataframe to a CSV in R
Ggplot2: Drop Unused Factors in a Faceted Bar Plot But Not Have Differing Bar Widths Between Facets
Using R to Download Gzipped Data File, Extract, and Import Data
Pivot_Longer with Multiple Classes Causes Error ("No Common Type")
Error: Vector Memory Exhausted (Limit Reached) R 3.5.0 MACos