How does cut with breaks work in R
cut
in your example splits the vector into the following parts:
0-1 (1
); 1-2 (2
); 2-3 (3
); 3-5 (4
); 5-7 (5
); 7-8 (6
); 8-10 (7
)
The numbers in brackets are default labels assigned by cut
to each bin, based on the breaks
values provided.
cut
by default is exclusive of the lower range. If you want to change that then you need to specify it in the include.lowest
argument.
You did not assign labels and default argument in this function is FALSE so an integer vector of level codes (in brackets) is used instead.
summary(data1)
is a summary of raw data andsummary(data1cut)
is a summary of your splits.
You can get the split you need using:
data2cut<-
cut(data1, breaks = c(1, 3.25, 5.50, 7.75, 10),
labels = c("1-3.25", "3.25-5.50", "5.50-7.75", "7.75-10"),
include.lowest = TRUE)
The result is the following:
> data2cut
[1] 1-3.25 1-3.25 1-3.25 3.25-5.50 3.25-5.50 5.50-7.75 5.50-7.75 7.75-10 7.75-10
[10] 7.75-10
Levels: 1-3.25 3.25-5.50 5.50-7.75 7.75-10
I hope it's clear now.
Multiple conditions (breaks) for cut function
Mabe subtracting and adding 0.5 around 0 could be usable for you.
cut(15:-15, c(seq(-15,0,5) - 0.5, 0.5 + seq(0,15,5)))
# [1] (10.5,15.5] (10.5,15.5] (10.5,15.5] (10.5,15.5] (10.5,15.5]
# [6] (5.5,10.5] (5.5,10.5] (5.5,10.5] (5.5,10.5] (5.5,10.5]
#[11] (0.5,5.5] (0.5,5.5] (0.5,5.5] (0.5,5.5] (0.5,5.5]
#[16] (-0.5,0.5] (-5.5,-0.5] (-5.5,-0.5] (-5.5,-0.5] (-5.5,-0.5]
#[21] (-5.5,-0.5] (-10.5,-5.5] (-10.5,-5.5] (-10.5,-5.5] (-10.5,-5.5]
#[26] (-10.5,-5.5] (-15.5,-10.5] (-15.5,-10.5] (-15.5,-10.5] (-15.5,-10.5]
#[31] (-15.5,-10.5]
#7 Levels: (-15.5,-10.5] (-10.5,-5.5] (-5.5,-0.5] (-0.5,0.5] ... (10.5,15.5]
cut function produces uneven first break
tl;dr to get what you might want, you'll probably need to specify breaks explicitly, and include.lowest=TRUE
:
cut(x,breaks=0:10,include.lowest=TRUE)
The issue is probably this, from the "Details" of ?cut
:
When ‘breaks’ is specified as a single number, the range of the
data is divided into ‘breaks’ pieces of equal length, and then the
outer limits are moved away by 0.1% of the range to ensure that
the extreme values both fall within the break intervals.
Since the range is (0,10), the outer limits are (-0.01, 10.01); as @Onyambu suggests, the results are asymmetric because the value at 0 lies on the left-hand boundary (not included) whereas the value at 10 lies on the right-hand boundary (included).
The (apparent) asymmetry is due to formatting; if you follow the code below (the core of base:::cut.default()
, you'll see that the top break is actually at 10.01, but gets formatted as "10" because the default number of digits is 3 ...
x <- 0:10
breaks <- 10
dig <- 3
nb <- as.integer(breaks+1)
dx <- diff(rx <- range(x, na.rm = TRUE))
breaks <- seq.int(rx[1L], rx[2L], length.out = nb)
breaks[c(1L, nb)] <- c(rx[1L] - dx/1000, rx[2L] + dx/1000)
ch.br <- formatC(0 + breaks, digits = dig, width = 1L)
cut method in r with a single number for the breaks argument
I recommend reading help of cut function. In Rstudio ?cut
.
You can read that the cut function divides the range of x into intervals and codes the values in x according to which interval they fall. The leftmost interval corresponds to level one, the next leftmost to level two and so on.
x <- c(2, 4, 6)
> cut(x, 3)
[1] (2,3.33] (3.33,4.67] (4.67,6]
Levels: (2,3.33] (3.33,4.67] (4.67,6]
> cut(x, 2)
[1] (2,4] (2,4] (4,6]
Levels: (2,4] (4,6]
> levels(cut(x, 2))
[1] "(2,4]" "(4,6]"
Using cut to create breaks that start at 0
Base function pretty
outputs pretty numbers. From the documentation, my emphasis.
Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen so that they are 1, 2 or 5 times a power of 10.
x <- seq(0, 102, length.out = 15)
cut(x, breaks = pretty(x, n = 10), include.lowest = TRUE)
#> [1] [0,10] [0,10] (10,20] (20,30] (20,30] (30,40] (40,50]
#> [8] (50,60] (50,60] (60,70] (70,80] (80,90] (80,90] (90,100]
#> [15] (100,110]
#> 11 Levels: [0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] ... (100,110]
Created on 2022-06-13 by the reprex package (v2.0.1)
how to set distinct breaks and cut data in R
What you are looking for is the parameter right
which has the description
logical, indicating if the intervals should be closed on the
right (and open on the left) or vice versa.
So what you want is to set right = FALSE
bmirange<-cut (bmidata,
breaks=c(0,18.5,25,30,100),
labels=c("<18.5", "18.5-24.99", "25-29.99", ">30"),
right = FALSE,
include.lowest = TRUE)
Related Topics
Setting Upper and Lower Limits in Rnorm
How to Apply Cross-Hatching to a Polygon Using the Grid Graphical System
Read/Write Data in Libsvm Format
Subsetting a Dataframe for a Specified Month and Year
Use R Code or Windows User Variable ("%Userprofile%") in Yaml
Converting Factors to Binary in R
Calculate Group Mean While Excluding Current Observation Using Dplyr
Cbind 2 Dataframes with Different Number of Rows
Count Observations Greater Than a Particular Value
Connect to Postgres via Ssl Using R
R- How to Dynamically Name Data Frames
Replace Duplicated Elements with Na, Instead of Removing Them
Set Ggplot Plots to Have Same X-Axis Width and Same Space Between Dot Plot Rows
How to Export S3 Method So It Is Available in Namespace
Change Day of the Month in a Date to First Day (01)
Add a New Column to a Dataframe Using Matching Values of Another Dataframe