Cut() Error - 'Breaks' Are Not Unique

Cut() error - 'breaks' are not unique

You get this error because quantile values in your data for columns b.1, a.2 and b.2 are the same for some levels, so they can't be directly used as breaks values in function cut().

apply(a,2,quantile,na.rm=T)
ID a.1 b.1 a.2 b.2
0% 1.00 37.5000 59.38 75.0 59.3800
25% 2.25 42.5000 100.00 87.5 91.6675
50% 3.50 58.3350 100.00 100.0 100.0000
75% 4.75 91.6675 100.00 100.0 100.0000
100% 6.00 100.0000 100.00 100.0 100.0000

One way to solve this problem would be to put quantile() inside unique() function - so you will remove all quantile values that are not unique. This of course will make less breaking points if quantiles are not unique.

res <- lapply(dup.temp[,1],function(i) {
breaks <- c(-Inf,unique(quantile(a[,paste(i,1,sep=".")], na.rm=T)),Inf)
cut(a[,paste(i,2,sep=".")],breaks)
})

[[1]]
[1] <NA> (91.7,100] (58.3,91.7] <NA> <NA> (91.7,100]
Levels: (-Inf,37.5] (37.5,42.5] (42.5,58.3] (58.3,91.7] (91.7,100] (100, Inf]

[[2]]
[1] (59.4,100] (59.4,100] (59.4,100] (-Inf,59.4] (59.4,100] (59.4,100]
Levels: (-Inf,59.4] (59.4,100] (100, Inf]

Why do I get a 'breaks not unique' error in my R code?

While defining the breaks, use unique() if you are using max(Calls_per_Hour).
This worked for me

m3 <- m2 %>%
mutate(Calls_bucket=cut(Calls_per_Hour,unique(c(0,2,4,6,8,10,12,14,16,18,20,max(Calls_per_Hour,na.rm=TRUE))),
labels=c("0-2","2-4","4-6","6-8","8-10","10-12","12-14","14-16","16-18","18-20",">20"),include.lowest = T))
  • unique() ensures a unique vector of cuts i.e. if max(Calls_per_Hour) is equal to a value from your given vector, the cuts remain unique.
  • Since you are using 0 to start your labels, you should also include 0 in your cuts.
  • Setting include.lowest=TRUE ensures that the lowest value encountered is assigned a label.

Assigning values in a column to deciles when breaks are not unique

Try this:

cut(rank(v, ties = "first"), 10, lab = FALSE)
## [1] 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 10 9 10

Alternatives include using ties = "last" or using ties = "random" or using order(order(v)) in place of rank(...).

Using cut to create breaks that start at 0

Base function pretty outputs pretty numbers. From the documentation, my emphasis.

Compute a sequence of about n+1 equally spaced ‘round’ values which cover the range of the values in x. The values are chosen so that they are 1, 2 or 5 times a power of 10.

x <- seq(0, 102, length.out = 15)
cut(x, breaks = pretty(x, n = 10), include.lowest = TRUE)
#> [1] [0,10] [0,10] (10,20] (20,30] (20,30] (30,40] (40,50]
#> [8] (50,60] (50,60] (60,70] (70,80] (80,90] (80,90] (90,100]
#> [15] (100,110]
#> 11 Levels: [0,10] (10,20] (20,30] (30,40] (40,50] (50,60] (60,70] ... (100,110]

Created on 2022-06-13 by the reprex package (v2.0.1)



Related Topics



Leave a reply



Submit