Cleaning up factor levels (collapsing multiple levels/labels)
UPDATE 2: See Uwe's answer which shows the new "tidyverse" way of doing this, which is quickly becoming the standard.
UPDATE 1: Duplicated labels (but not levels!) are now indeed allowed (per my comment above); see Tim's answer.
ORIGINAL ANSWER, BUT STILL USEFUL AND OF INTEREST:
There is a little known option to pass a named list to the levels
function, for exactly this purpose. The names of the list should be the desired names of the levels and the elements should be the current names that should be renamed. Some (including the OP, see Ricardo's comment to Tim's answer) prefer this for ease of reading.
x <- c("Y", "Y", "Yes", "N", "No", "H", NA)
x <- factor(x)
levels(x) <- list("Yes"=c("Y", "Yes"), "No"=c("N", "No"))
x
## [1] Yes Yes Yes No No <NA> <NA>
## Levels: Yes No
As mentioned in the levels
documentation; also see the examples there.
value: For the 'factor' method, a
vector of character strings with length at least the number
of levels of 'x', or a named list specifying how to rename
the levels.
This can also be done in one line, as Marek does here: https://stackoverflow.com/a/10432263/210673; the levels<-
sorcery is explained here https://stackoverflow.com/a/10491881/210673.
> `levels<-`(factor(x), list(Yes=c("Y", "Yes"), No=c("N", "No")))
[1] Yes Yes Yes No No <NA>
Levels: Yes No
Collapsing multiple factor levels of (messy) character variable in R
A friend of mine actually provided the answer. It's nothing to do with the data structure.
This does the job:
dt$x <- fct_collapse(dt$x,
No = c(
"I don't allow anything",
"..."),
Yes= c(
"Number of visitors ,annual sales, sales growth",
"number of customers",
"Net sales",
"..."),
Maybe= c(
"The CEO's approval is needed.",
"To be discussed")
)
I still don't know why the first option I posted above doesn't work though (it did perfectly well with another variable).
Problem collapsing levels of a factor in R
Have you tried making Nationality a factor first?
df <- data.frame(ID=seq(1:10),
Nationality=c("espanol", "spaniol", "ESPANOL",
"spanish", "colombia", "Colombian",
"British", "brit", "ESPanol", "UK")
)
library(forcats)
df2 <- df %>%
mutate(Nationality = factor(Nationality)) %>%
mutate(Nationality = fct_collapse(Nationality, Spanish = c("espanol", "spaniol", "ESPANOL", "spanish", "ESPanol"),
Colombian = c("colombia", "Colombian"),
British = c("British", "brit", "UK")))
#more concise
mutate(across(Nationality, ~ fct_collapse(factor(.),
Spanish = c("espanol", "spaniol", "ESPANOL", "spanish", "ESPanol"),
Colombian = c("colombia", "Colombian"),
British = c("British", "brit", "UK")
)))
Only certain values of column as levels in factor
Yes. Use the labels
option:
x <- c("a","a","b","b","happy", "sad", "angry")
levels = c("a", "b", "happy", "sad", "angry")
labels = c("letter", "letter", "happy", "sad", "angry")
y <- factor(x, levels, labels = labels)
y
https://rdrr.io/r/base/factor.html
"Duplicated values in labels can be used to map different values of x to the same factor level."
EDIT: Your mistake in the above code example is the nested vector.
Combine factor levels
Just do
levels(data2)[2:3] <- '(1,4]'
data2
#[1] (0,1] (1,4] (0,1] (0,1] (1,4] (1,4] (1,4] (1,4] (1,4] (1,4] (1,4] (1,4]
#[13] (1,4]
#Levels: (0,1] (1,4]
Related Topics
Showing Data Values on Stacked Bar Chart in Ggplot2
Convert Continuous Numeric Values to Discrete Categories Defined by Intervals
Counting Unique/Distinct Values by Group in a Data Frame
How to Replace Na Values With Zeros in an R Dataframe
Evaluate Expression Given as a String
Apply Several Summary Functions on Several Variables by Group in One Call
Summarizing Multiple Columns With Dplyr
Side-By-Side Plots With Ggplot2
How to Get a Contingency Table
How to Disable Scientific Notation
How to Install an R Package from Source
Subset Data Frame Based on Number of Rows Per Group
Selecting Multiple Odd or Even Columns/Rows for Dataframe
Adding a New Column Based Upon Values in Another Column Using Dplyr