Categorize Numeric Variable into Group/ Bins/ Breaks

Categorize numeric variable into group/ bins/ breaks

I would use findInterval() here:

First, make up some sample data

set.seed(1)
ages <- floor(runif(20, min = 20, max = 50))
ages
# [1] 27 31 37 47 26 46 48 39 38 21 26 25 40 31 43 34 41 49 31 43

Use findInterval() to categorize your "ages" vector.

findInterval(ages, c(20, 30, 40))
# [1] 1 2 2 3 1 3 3 2 2 1 1 1 3 2 3 2 3 3 2 3

Alternatively, as recommended in the comments, cut() is also useful here:

cut(ages, breaks=c(20, 30, 40, 50), right = FALSE)
cut(ages, breaks=c(20, 30, 40, 50), right = FALSE, labels = FALSE)

Categorize numeric variable with mutate

set.seed(123)
df <- data.frame(a = rnorm(10), b = rnorm(10))

df %>% mutate(a = cut(a, breaks = quantile(a, probs = seq(0, 1, 0.2))))

giving:

                 a          b
1  (-0.586,-0.316]  1.2240818
2   (-0.316,0.094]  0.3598138
3      (0.68,1.72]  0.4007715
4   (-0.316,0.094]  0.1106827
5     (0.094,0.68] -0.5558411
6      (0.68,1.72]  1.7869131
7     (0.094,0.68]  0.4978505
8             <NA> -1.9666172
9   (-1.27,-0.586]  0.7013559
10 (-0.586,-0.316] -0.4727914

R categorize numeric value using case_when

We could use cut function:

library(dplyr)

labels <- c("1 km", "10 km", "20 km", "50 km")

data %>% 
  mutate(within_km =  cut(distance_km, 
                          breaks = c(0, 1, 10, 20, 50), 
                          labels = labels))

  id    distance_km within_km
  <chr>       <dbl> <fct>    
1 1             0.5 1 km     
2 2             1.5 10 km    
3 3            10.5 20 km    
4 4            43   50 km    
5 5            20.7 50 km

Splitting a continuous variable into equal sized groups

try this:

split(das, cut(das$anim, 3))

if you want to split based on the value of wt, then

library(Hmisc) # cut2
split(das, cut2(das$wt, g=3))

anyway, you can do that by combining cut, cut2 and split.

UPDATED

if you want a group index as an additional column, then

das$group <- cut(das$anim, 3)

if the column should be index like 1, 2, ..., then

das$group <- as.numeric(cut(das$anim, 3))

UPDATED AGAIN

try this:

> das$wt2 <- as.numeric(cut2(das$wt, g=3))
> das
   anim    wt wt2
1     1 181.0   1
2     2 179.0   1
3     3 180.5   1
4     4 201.0   2
5     5 201.5   2
6     6 245.0   2
7     7 246.4   3
8     8 189.3   1
9     9 301.0   3
10   10 354.0   3
11   11 369.0   3
12   12 205.0   2
13   13 199.0   1
14   14 394.0   3
15   15 231.3   2

Create 4 categories variables

I may be misunderstanding something, but you appear to have overlapping categories- Total >= 2 is basic, but Total < 3 is good. You may want to confirm the bounds for your groupings. Once that's sorted, you were actually pretty close to a working solution- you can nest ifelse statements and consider that they are evaluated in order. So, if a condition evaluates to TRUE "early" in the chain, it will return whatever is the output for a TRUE response at that point. Otherwise, it will move to the next ifelse to evaluate. Note here that I've used 1, 2, and 3 as the 'breaks' for the categories, so that the logic evaluates to: "If it's less than 1, it's Limited. If it's less than 2, it's Basic. If it's less than 3, it's good. Otherwise, it's Full."

set.seed(123)
df <- data.frame(total = runif(n = 15, min = 0, max = 4))
df


df$level = ifelse(df$total < 1, "Limited", 
                  ifelse(df$total < 2, "Basic", 
                         ifelse(df$total < 3, "Good", "Full")))
> df
       total   level
1  0.5691772 Limited
2  2.1971386    Good
3  3.8163650    Full
4  2.3419334    Good
5  1.6180411   Basic
6  2.5915739    Good
7  1.2792825   Basic
8  1.2308800   Basic
9  0.8790705 Limited
10 1.4779555   Basic
11 3.9368768    Full
12 0.6168092 Limited
13 0.3641760 Limited
14 0.5676276 Limited
15 2.7600284    Good

With just four categories an ifelse block is probably fine- if I were using many more bounds I'd likely use a different approach Edit: like thelatemail's- it's far cleaner.

Categorize Numeric Variable into Group/ Bins/ Breaks