Cut() error - 'breaks' are not unique
You get this error because quantile values in your data for columns b.1
, a.2
and b.2
are the same for some levels, so they can't be directly used as breaks values in function cut()
.
apply(a,2,quantile,na.rm=T)
ID a.1 b.1 a.2 b.2
0% 1.00 37.5000 59.38 75.0 59.3800
25% 2.25 42.5000 100.00 87.5 91.6675
50% 3.50 58.3350 100.00 100.0 100.0000
75% 4.75 91.6675 100.00 100.0 100.0000
100% 6.00 100.0000 100.00 100.0 100.0000
One way to solve this problem would be to put quantile()
inside unique()
function - so you will remove all quantile values that are not unique. This of course will make less breaking points if quantiles are not unique.
res <- lapply(dup.temp[,1],function(i) {
breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)
cut(a[,paste(i,2,sep=".")],breaks)
})
[[1]]
[1] <NA> (91.7,100] (58.3,91.7] <NA> <NA> (91.7,100]
Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]
[[2]]
[1] (59.4,100] (59.4,100] (59.4,100] (-Inf,59.4] (59.4,100] (59.4,100]
Levels: (-Inf,59.4] (59.4,100] (100, Inf]
Why do I get a 'breaks not unique' error in my R code?
While defining the breaks, use unique()
if you are using max(Calls_per_Hour)
.
This worked for me
m3 <- m2 %>%
mutate(Calls_bucket=cut(Calls_per_Hour,unique(c(0,2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour,na.rm=TRUE))),
labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20"),include.lowest = T))
unique()
ensures a unique vector of cuts i.e. ifmax(Calls_per_Hour)
is equal to a value from your given vector, the cuts remain unique.- Since you are using 0 to start your labels, you should also include 0 in your cuts.
- Setting
include.lowest=TRUE
ensures that the lowest value encountered is assigned a label.
Assigning values in a column to deciles when breaks are not unique
Try this:
cut(rank(v, ties = "first"), 10, lab = FALSE)
## [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 9 10
Alternatives include using ties = "last"
or using ties = "random"
or using order(order(v))
in place of rank(...)
.
Using cut to create breaks that start at 0
Base function pretty
outputs pretty numbers. From the documentation, my emphasis.
Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen so that they are 1, 2 or 5 times a power of 10.
x <- seq(0, 102, length.out = 15)
cut(x, breaks = pretty(x, n = 10), include.lowest = TRUE)
#> [1] [0,10] [0,10] (10,20] (20,30] (20,30] (30,40] (40,50]
#> [8] (50,60] (50,60] (60,70] (70,80] (80,90] (80,90] (90,100]
#> [15] (100,110]
#> 11 Levels: [0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] ... (100,110]
Created on 2022-06-13 by the reprex package (v2.0.1)
Related Topics
Lme4::Lmer Reports "Fixed-Effect Model Matrix Is Rank Deficient", Do I Need a Fix and How To
R: How to Get the Week Number of the Month
How to Convert Integer into Categorical Data in R
How to Use Tidyr::Separate When the Number of Needed Variables Is Unknown
Group by and Filter Data Management Using Dplyr
Scraping with Rvest - Complete with Nas When Tag Is Not Present
Lapply-Ing with the "$" Function
Outputting Multiple Lines of Text with Rendertext() in R Shiny
Take Sum of a Variable If Combination of Values in Two Other Columns Are Unique
Plot a Function with Ggplot, Equivalent of Curve()
How to 'Print' or 'Cat' When Using Parallel
One-Hot Encoding in [R] | Categorical to Dummy Variables
How to Create a Grouped Boxplot in R
How to Apply Cross-Hatching to a Polygon Using the Grid Graphical System