Grouping factor levels in a data.table
Update:
I recently learned of a much simpler way to re-associate factor levels from this question and a closer reading of ?levels
. No merges, correspondence table, etc. necessary, just pass a named list
to levels
:
levels(DT$ind) = list(A = c(1, 3, 8), B = c(2, 4), C = 5:7)
Original Answer:
As suggested by @Arun we have the option of creating the correspondence as a separate data.table
, then joining it to the original:
match_dt = data.table(ind = as.factor(1:12),
grp = as.factor(c("A", "B", "A", "B", "C", "C",
"C", "A", "D", "E", "F", "D")))
setkey(DT, ind)
setkey(match_dt, ind)
DT = match_dt[DT]
We can also do this in (what I consider to be) the more readable fashion like so (with marginal speed costs):
levels <- letters[1:12]
levels[c(1, 3, 8)] <- "A"
levels[c(2, 4)] <- "B"
levels[5:7] <- "C"
levels[c(9, 12)] <- "D"
levels[10] <- "E"
levels[11] <- "F"
match_dt <- data.table(ind = as.factor(1:12),
grp = as.factor(levels))
setkey(DT, ind)
setkey(match_dt, ind)
DT = match_dt[DT]
Factor levels by group
A data.table
solution:
dt[, height_cat := cut(Height, breaks = c(0, 165, 180, 300), right = FALSE)]
dt[, height_f :=
factor(
paste(Sex, height_cat, sep = ":"),
levels = dt[, CJ(Sex, height_cat, unique = TRUE)][, paste(Sex, height_cat, sep = ":")]
)]
table(dt$height_f)
# F:[0,165) F:[165,180) F:[180,300) M:[0,165) M:[165,180) M:[180,300)
# 2 2 0 0 2 2
How do you group factor levels in R?
Grouping factor levels can easily be done by assigning the grouping in a list. Here an example with toy data:
levels(mydata$value)
# [1] "not likely" "slightly likely" "likely" "very likely"
levels(mydata$value) <- list("unlikely"=c("not likely", "slightly likely"),
"likely"=c("likely", "very likely"))
levels(mydata$value)
# [1] "unlikely" "likely"
After that you probably want to do this:
(Statistical_Testing.aov <- aov(as.integer(value) ~ question, data = mydata))
# Call:
# aov(formula = as.integer(value) ~ question, data = mydata)
#
# Terms:
# question Residuals
# Sum of Squares 0.18 5.82
# Deg. of Freedom 1 23
#
# Residual standard error: 0.5030343
# Estimated effects may be unbalanced
(Statistical_Testing.anova <- anova(Statistical_Testing.aov))
# Analysis of Variance Table
#
# Response: as.integer(value)
# Df Sum Sq Mean Sq F value Pr(>F)
# question 1 0.18 0.18000 0.7113 0.4077
# Residuals 23 5.82 0.25304
Toy data:
set.seed(42)
mydata <- transform(expand.grid(question=1:5, id=1:5),
value=factor(sample(1:4, 25, rep=T),
labels=c("not likely", "slightly likely",
"likely", "very likely")))
Updating factor levels from another table by data.table
You could do the following:
subset_table[,
(nonnumeric_column) :=
lapply(nonnumeric_column, \(x) factor(get(x), levels = unique(bigger_table[[x]])))
]
Resulting in
> lapply(subset_table, levels)
$region
[1] "region_1" "region_3" "region_2" "region_4"
$factor_column
[1] "C" "B" "A"
$numeric_column
NULL
The problem in your original solution is that x
is not returning the name of the column but the actual column. You can see this with:
subset_table[, lapply(.SD, \(x) print(x)), .SDcols=nonnumeric_column]
How to group factor levels?
I made up an example character vector with all of the abbreviations:
my_example <- c("C","OG","OT","TE","DT","DE","CB","WR","FS",
"FB","ILB","OLB","P","QB","RB","SS","WR")
class(my_example)
[1] "character"
Then I substituted the desired levels for their abbreviations (you could also use gsub
here or any of many, many different approaches):
my_example[my_example %in% c("C","OG","OT","TE","DT","DE")] <- "Linemen"
my_example[my_example %in% c("CB","WR","FS")] <- "Small Backs"
my_example[my_example %in% c("FB","ILB","OLB","P",
"QB","RB","SS","WR")] <- "Big Backs"
Then I made it into a factor:
my_example <- as.factor(my_example)
head(my_example)
[1] Linemen Linemen Linemen Linemen Linemen Linemen
Levels: Big Backs Linemen Small Backs
tail(my_example)
[1] Big Backs Big Backs Big Backs Big Backs Big Backs Small Backs
Levels: Big Backs Linemen Small Backs
class(my_example)
[1] "factor"
Grouping 2 levels of a factor in R
Use levels(x) <- ...
to specify new levels, and to combine some previous levels. For example:
f <- factor(LETTERS[c(1:3, 3:1)])
f
[1] A B C C B A
Levels: A B C
Now combine "A" and "B" into a single level:
levels(f) <- c("A", "A", "C")
f
[1] A A C C A A
Levels: A C
R grouping data with factors and levels
You first need to transform the vector so that it has an unique entry for, then you can add the missing levels in the factor()
function:
X <- c(1,2,3,4,3,9,20)
X <- ifelse(X>5,">5",X)
X <- factor(X,levels=c(0:5,">5"))
This results in:
X
[1] 1 2 3 4 3 >5 >5
Levels: 0 1 2 3 4 5 >5
Reorder factor levels within group
To reorder the factor levels you can use forcats
(part of the tidyverse
), and do something like this...
library(forcats)
df2 <- df %>% mutate(a_factor = fct_reorder(a_factor,
value*(-1 + 2 * (group=="group1"))))
levels(df2$a_factor)
[1] "f" "e" "d" "a" "b" "c"
This does not rearrange the dataframe itself...
df2
a_factor group value
1 a group1 1
2 b group1 2
3 c group1 3
4 d group2 4
5 e group2 5
6 f group2 6
Related Topics
Test If Element Is in a List and Return 0 or 1
Accessing Functions with a Dot in Their Name (Eg. "As.Vector") Using Rpy2
R - Random Forest and More Than 53 Categories
How to Manage Parallel Processing with Animated Ggplot2-Plot
Replace Na with Mode Based on Id Attribute
R: Miscellaneous Errors While Trying to Plot Graphs
Equivalent of Which in Scraping
Web Scraping Data Table with R Rvest
Distance Calculation on Large Vectors [Performance]
Filter a Column Which Contains Several Keywords
Place Text Values to Right of Sankey Diagram
R: in Barplot Midpoints Are Not Centered W.R.T. Bars
Set Standard Legend Key Size with Long Label Names Ggplot
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
Using Tidy Eval for Multiple Dplyr Filter Conditions