Categorize Continuous Variable with Dplyr

Categorize numeric variable with mutate

set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))

df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))

giving:

                 a          b
1  (-0.586,-0.316]  1.2240818
2   (-0.316,0.094]  0.3598138
3      (0.68,1.72]  0.4007715
4   (-0.316,0.094]  0.1106827
5     (0.094,0.68] -0.5558411
6      (0.68,1.72]  1.7869131
7     (0.094,0.68]  0.4978505
8             <NA> -1.9666172
9   (-1.27,-0.586]  0.7013559
10 (-0.586,-0.316] -0.4727914

Categorize a continuous variable based on groups of n in R

You can use the integer division operator %/% to get the whole number part of dividing x by 10, then add 1 to it. This will give you the correct step number. Add this into a paste0 call to glue "step_" onto the front and you've got it:

df %>% mutate(z = paste0("step_", (x %/% 10 + 1)))
#> # A tibble: 13 x 3
#>        x       y z     
#>    <dbl>   <dbl> <chr> 
#>  1     0  0.595  step_1
#>  2     2  1.44   step_1
#>  3     6 -0.375  step_1
#>  4     9 -0.808  step_1
#>  5    10 -0.298  step_2
#>  6    13 -0.774  step_2
#>  7    14 -0.769  step_2
#>  8    17  0.335  step_2
#>  9    20  0.696  step_3
#> 10    21  0.284  step_3
#> 11    24 -0.568  step_3
#> 12    28 -0.0942 step_3
#> 13    29 -0.547  step_3

Categorize numeric variable into group/ bins/ breaks

I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

Recode continuous data into categorical data using is.na() and if_else() in R

I have provided a toy example to stand in for the code that you described:

df <- data.frame(x = c(1,2,3,4,NA,NA,NA,NA))

Here, we have a data frame with continuous and NA values, and using dplyr, we can use you functions to categorize "x":

library(dplyr)
df <- df %>% 
mutate(new_data = if_else(is.na(x), "is NA", "is not NA"))

This creates a new column that categorizes your NA values to "is NA".

Recoding continuous variable into categorical with *specific categories, in R using Tidyverse

A tidyverse approach would make use of dplyr::case_when to recode the variable like so:

data %>% 
  mutate(age = case_when(
    `Age(Self-report)` < 35 ~ "18-34",
    `Age(Self-report)` > 34 & `Age(Self-report)` < 55 ~ "35-54",
    `Age(Self-report)` > 55 ~ "55+"
  ))