How Does Cut with Breaks Work in R

How does cut with breaks work in R

cut in your example splits the vector into the following parts:
0-1 (1); 1-2 (2); 2-3 (3); 3-5 (4); 5-7 (5); 7-8 (6); 8-10 (7)

The numbers in brackets are default labels assigned by cut to each bin, based on the breaks values provided.

cut by default is exclusive of the lower range. If you want to change that then you need to specify it in the include.lowest argument.

  1. You did not assign labels and default argument in this function is FALSE so an integer vector of level codes (in brackets) is used instead.

  2. summary(data1) is a summary of raw data and summary(data1cut) is a summary of your splits.

You can get the split you need using:

data2cut<- 
cut(data1, breaks = c(1, 3.25, 5.50, 7.75, 10),
labels = c("1-3.25", "3.25-5.50", "5.50-7.75", "7.75-10"),
include.lowest = TRUE)

The result is the following:

> data2cut

[1] 1-3.25 1-3.25 1-3.25 3.25-5.50 3.25-5.50 5.50-7.75 5.50-7.75 7.75-10 7.75-10
[10] 7.75-10
Levels: 1-3.25 3.25-5.50 5.50-7.75 7.75-10

I hope it's clear now.

Multiple conditions (breaks) for cut function

Mabe subtracting and adding 0.5 around 0 could be usable for you.

cut(15:-15, c(seq(-15,0,5) - 0.5, 0.5 + seq(0,15,5)))
# [1] (10.5,15.5] (10.5,15.5] (10.5,15.5] (10.5,15.5] (10.5,15.5]
# [6] (5.5,10.5] (5.5,10.5] (5.5,10.5] (5.5,10.5] (5.5,10.5]
#[11] (0.5,5.5] (0.5,5.5] (0.5,5.5] (0.5,5.5] (0.5,5.5]
#[16] (-0.5,0.5] (-5.5,-0.5] (-5.5,-0.5] (-5.5,-0.5] (-5.5,-0.5]
#[21] (-5.5,-0.5] (-10.5,-5.5] (-10.5,-5.5] (-10.5,-5.5] (-10.5,-5.5]
#[26] (-10.5,-5.5] (-15.5,-10.5] (-15.5,-10.5] (-15.5,-10.5] (-15.5,-10.5]
#[31] (-15.5,-10.5]
#7 Levels: (-15.5,-10.5] (-10.5,-5.5] (-5.5,-0.5] (-0.5,0.5] ... (10.5,15.5]

cut function produces uneven first break

tl;dr to get what you might want, you'll probably need to specify breaks explicitly, and include.lowest=TRUE:

cut(x,breaks=0:10,include.lowest=TRUE)

The issue is probably this, from the "Details" of ?cut:

When ‘breaks’ is specified as a single number, the range of the
data is divided into ‘breaks’ pieces of equal length, and then the
outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals.

Since the range is (0,10), the outer limits are (-0.01, 10.01); as @Onyambu suggests, the results are asymmetric because the value at 0 lies on the left-hand boundary (not included) whereas the value at 10 lies on the right-hand boundary (included).

The (apparent) asymmetry is due to formatting; if you follow the code below (the core of base:::cut.default(), you'll see that the top break is actually at 10.01, but gets formatted as "10" because the default number of digits is 3 ...

x <- 0:10
breaks <- 10
dig <- 3
nb <- as.integer(breaks+1)
dx <- diff(rx <- range(x, na.rm = TRUE))
breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + dx/1000)
ch.br <- formatC(0 + breaks, digits = dig, width = 1L)

cut method in r with a single number for the breaks argument

I recommend reading help of cut function. In Rstudio ?cut.
You can read that the cut function divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.

x <- c(2, 4, 6)

> cut(x, 3)
[1] (2,3.33] (3.33,4.67] (4.67,6]
Levels: (2,3.33] (3.33,4.67] (4.67,6]

> cut(x, 2)
[1] (2,4] (2,4] (4,6]
Levels: (2,4] (4,6]

> levels(cut(x, 2))
[1] "(2,4]" "(4,6]"

Using cut to create breaks that start at 0

Base function pretty outputs pretty numbers. From the documentation, my emphasis.

Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen so that they are 1, 2 or 5 times a power of 10.

x <- seq(0, 102, length.out = 15)
cut(x, breaks = pretty(x, n = 10), include.lowest = TRUE)
#> [1] [0,10] [0,10] (10,20] (20,30] (20,30] (30,40] (40,50]
#> [8] (50,60] (50,60] (60,70] (70,80] (80,90] (80,90] (90,100]
#> [15] (100,110]
#> 11 Levels: [0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] ... (100,110]

Created on 2022-06-13 by the reprex package (v2.0.1)

how to set distinct breaks and cut data in R

What you are looking for is the parameter right which has the description

logical, indicating if the intervals should be closed on the 
right (and open on the left) or vice versa.

So what you want is to set right = FALSE

bmirange<-cut (bmidata, 
breaks=c(0,18.5,25,30,100),
labels=c("<18.5", "18.5-24.99", "25-29.99", ">30"),
right = FALSE,
include.lowest = TRUE)


Related Topics



Leave a reply



Submit