Categorize Numeric Variable with Mutate

Categorize numeric variable with mutate

set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))

df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))

giving:

                 a          b
1 (-0.586,-0.316] 1.2240818
2 (-0.316,0.094] 0.3598138
3 (0.68,1.72] 0.4007715
4 (-0.316,0.094] 0.1106827
5 (0.094,0.68] -0.5558411
6 (0.68,1.72] 1.7869131
7 (0.094,0.68] 0.4978505
8 <NA> -1.9666172
9 (-1.27,-0.586] 0.7013559
10 (-0.586,-0.316] -0.4727914

R categorize numeric value using case_when

We could use cut function:

library(dplyr)

labels <- c("1 km", "10 km", "20 km", "50 km")

data %>%
mutate(within_km = cut(distance_km,
breaks = c(0, 1, 10, 20, 50),
labels = labels))
  id    distance_km within_km
<chr> <dbl> <fct>
1 1 0.5 1 km
2 2 1.5 10 km
3 3 10.5 20 km
4 4 43 50 km
5 5 20.7 50 km

Categorize numeric variable into group/ bins/ breaks

I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

R: Convert all columns to numeric with mutate while maintaining character columns

A possible solution:

df <- type.convert(df, as.is = T)
str(df)

#> 'data.frame': 4 obs. of 4 variables:
#> $ Col1: int 647 237 863 236
#> $ Col2: int 125 623 854 234
#> $ Col3: chr "ABC" "BCA" "DFL" "KFD"
#> $ Col4: chr "PWD" "CDL" "QOW" "DKC"

Categorize a continuous variable based on groups of n in R

You can use the integer division operator %/% to get the whole number part of dividing x by 10, then add 1 to it. This will give you the correct step number. Add this into a paste0 call to glue "step_" onto the front and you've got it:

df %>% mutate(z = paste0("step_", (x %/% 10 + 1)))
#> # A tibble: 13 x 3
#> x y z
#> <dbl> <dbl> <chr>
#> 1 0 0.595 step_1
#> 2 2 1.44 step_1
#> 3 6 -0.375 step_1
#> 4 9 -0.808 step_1
#> 5 10 -0.298 step_2
#> 6 13 -0.774 step_2
#> 7 14 -0.769 step_2
#> 8 17 0.335 step_2
#> 9 20 0.696 step_3
#> 10 21 0.284 step_3
#> 11 24 -0.568 step_3
#> 12 28 -0.0942 step_3
#> 13 29 -0.547 step_3

Recoding continuous variable into categorical with *specific categories, in R using Tidyverse

A tidyverse approach would make use of dplyr::case_when to recode the variable like so:

data %>% 
mutate(age = case_when(
`Age(Self-report)` < 35 ~ "18-34",
`Age(Self-report)` > 34 & `Age(Self-report)` < 55 ~ "35-54",
`Age(Self-report)` > 55 ~ "55+"
))


Related Topics



Leave a reply



Submit