Categorize numeric variable with mutate
set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))
df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))
giving:
a b
1 (-0.586,-0.316] 1.2240818
2 (-0.316,0.094] 0.3598138
3 (0.68,1.72] 0.4007715
4 (-0.316,0.094] 0.1106827
5 (0.094,0.68] -0.5558411
6 (0.68,1.72] 1.7869131
7 (0.094,0.68] 0.4978505
8 <NA> -1.9666172
9 (-1.27,-0.586] 0.7013559
10 (-0.586,-0.316] -0.4727914
R categorize numeric value using case_when
We could use cut
function:
library(dplyr)
labels <- c("1 km", "10 km", "20 km", "50 km")
data %>%
mutate(within_km = cut(distance_km,
breaks = c(0, 1, 10, 20, 50),
labels = labels))
id distance_km within_km
<chr> <dbl> <fct>
1 1 0.5 1 km
2 2 1.5 10 km
3 3 10.5 20 km
4 4 43 50 km
5 5 20.7 50 km
Categorize numeric variable into group/ bins/ breaks
I would use findInterval()
here:
First, make up some sample data
set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43
Use findInterval()
to categorize your "ages" vector.
findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3
Alternatively, as recommended in the comments, cut()
is also useful here:
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)
R: Convert all columns to numeric with mutate while maintaining character columns
A possible solution:
df <- type.convert(df, as.is = T)
str(df)
#> 'data.frame': 4 obs. of 4 variables:
#> $ Col1: int 647 237 863 236
#> $ Col2: int 125 623 854 234
#> $ Col3: chr "ABC" "BCA" "DFL" "KFD"
#> $ Col4: chr "PWD" "CDL" "QOW" "DKC"
Categorize a continuous variable based on groups of n in R
You can use the integer division operator %/%
to get the whole number part of dividing x by 10, then add 1 to it. This will give you the correct step number. Add this into a paste0
call to glue "step_"
onto the front and you've got it:
df %>% mutate(z = paste0("step_", (x %/% 10 + 1)))
#> # A tibble: 13 x 3
#> x y z
#> <dbl> <dbl> <chr>
#> 1 0 0.595 step_1
#> 2 2 1.44 step_1
#> 3 6 -0.375 step_1
#> 4 9 -0.808 step_1
#> 5 10 -0.298 step_2
#> 6 13 -0.774 step_2
#> 7 14 -0.769 step_2
#> 8 17 0.335 step_2
#> 9 20 0.696 step_3
#> 10 21 0.284 step_3
#> 11 24 -0.568 step_3
#> 12 28 -0.0942 step_3
#> 13 29 -0.547 step_3
Recoding continuous variable into categorical with *specific categories, in R using Tidyverse
A tidyverse
approach would make use of dplyr::case_when
to recode the variable like so:
data %>%
mutate(age = case_when(
`Age(Self-report)` < 35 ~ "18-34",
`Age(Self-report)` > 34 & `Age(Self-report)` < 55 ~ "35-54",
`Age(Self-report)` > 55 ~ "55+"
))
Related Topics
Adding Percentage Labels to a Bar Chart in Ggplot2
How to Order the Months Chronologically in Ggplot2 Short of Writing the Months Out
What's the Differences Between & and &&, | and || in R
Using Multiple Criteria in Subset Function and Logical Operators
Function to Calculate R2 (R-Squared) in R
Pass a Vector of Variable Names to Arrange() in Dplyr
Tidyverse Pivot_Longer Several Sets of Columns, But Avoid Intermediate Mutate_Wider Steps
Concatenate Unique Strings After Groupby in R
R: How to Handle Times Without Dates
Seeing If Data Is Normally Distributed in R
Saving Grid.Arrange() Plot to File
Can't Print to PDF Ggplot Charts
How to Add Legend to Ggplot Manually? - R
Deleting Reversed Duplicates with R
How to Search for "R" Materials
How to Run R on a Server Without X11, and Avoid Broken Dependencies