Collapse Consecutive Runs of Numbers to a String of Ranges

Collapse consecutive runs of numbers to a string of ranges

Adding another alternative, you could use a deparseing approach. For example:

deparse(c(1L, 2L, 3L))
#[1] "1:3"

Taking advantage of as.character "deparse"ing a given "list" as input, we could use:

as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"

Collapse runs of consecutive numbers to ranges

I took some heavy inspiration from the answers in this question.

findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}

s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))

paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"

EDIT: Multiple solutions: benchmarking time!

Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000

@speendo's solution is the fastest at the moment, but none of these have been optimised yet.

Collapse continuous integer runs to strings of ranges

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

Dash between consecutive numbers

In base R using tapply :

data = c(18,20:25,28:30)
result <- unlist(tapply(data, cumsum(c(FALSE, diff(data) > 1)), function(x)
c('-', x)), use.names = FALSE)[-1]

#[1] "18" "-" "20" "21" "22" "23" "24" "25" "-" "28" "29" "30"

In every group of consecutive numbers prepend the sequence with "-".

Collapse sequences of numbers into ranges

Doable. Let's see if this can be done with pandas.

import pandas as pd

data = ['10215', '10216', '10277', ...]
# Load data as series.
s = pd.Series(data)
# Find all consecutive rows with a difference of one
# and bin them into groups using `cumsum`.
v = s.astype(int).diff().bfill().ne(1).cumsum()
# Use `groupby` and `apply` to condense the consecutive numbers into ranges.
# This is only done if the group size is >1.
ranges = (
s.groupby(v).apply(
lambda x: '-'.join(x.values[[0, -1]]) if len(x) > 1 else x.item()).tolist())

print (ranges)
['10215-10216',
'10277-10282',
'10292-10293',
'10295-10326',
'10344',
'10399-10406',
'10415-10418',
'10430',
'10448',
'10492-10495',
'10574-10659',
'10707-10710',
'10792-10795',
'10908',
'10936-10939',
'11108-11155',
'11194-11235',
'10101-10102',
'10800',
'11236']

Your data must be sorted for this to work.



Related Topics



Leave a reply



Submit