Collapse Continuous Integer Runs to Strings of Ranges

Collapse continuous integer runs to strings of ranges

I think diff is the solution. You might need some additional fiddling to deal with the singletons, but:

lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})

$greg
[1] "7:11, 20:24, 30:33, 49:49"

$researcher
[1] "42:48"

$sally
[1] "25:29, 37:41"

$sam
[1] "1:6, 16:19, 34:36"

$teacher
[1] "12:15"

Collapse consecutive runs of numbers to a string of ranges

Adding another alternative, you could use a deparseing approach. For example:

deparse(c(1L, 2L, 3L))
#[1] "1:3"

Taking advantage of as.character "deparse"ing a given "list" as input, we could use:

as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"

Collapse runs of consecutive numbers to ranges

I took some heavy inspiration from the answers in this question.

findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}

s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))

paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"

EDIT: Multiple solutions: benchmarking time!

Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000

@speendo's solution is the fastest at the moment, but none of these have been optimised yet.

R reducing a numeric vector for reporting

One option is to split the vector by creatng a grouping column based on the difference of adjacent elements, convert to logical, taking the cumsum of the logical vector, and then loop over the list with sapply, paste the range (or min/max values), and finally paste with the prefix string

paste0("Missing data: rows ", toString(sapply(split(x, 
cumsum(c(TRUE, diff(x) != 1))), function(x) paste0(min(x), ":", max(x)))))
#[1] "Missing data: rows 1:10, 20:30, 55:57"

Collapse sequences of numbers into ranges

Doable. Let's see if this can be done with pandas.

import pandas as pd

data = ['10215', '10216', '10277', ...]
# Load data as series.
s = pd.Series(data)
# Find all consecutive rows with a difference of one
# and bin them into groups using `cumsum`.
v = s.astype(int).diff().bfill().ne(1).cumsum()
# Use `groupby` and `apply` to condense the consecutive numbers into ranges.
# This is only done if the group size is >1.
ranges = (
s.groupby(v).apply(
lambda x: '-'.join(x.values[[0, -1]]) if len(x) > 1 else x.item()).tolist())

print (ranges)
['10215-10216',
'10277-10282',
'10292-10293',
'10295-10326',
'10344',
'10399-10406',
'10415-10418',
'10430',
'10448',
'10492-10495',
'10574-10659',
'10707-10710',
'10792-10795',
'10908',
'10936-10939',
'11108-11155',
'11194-11235',
'10101-10102',
'10800',
'11236']

Your data must be sorted for this to work.

Split a numeric vector into continuous chunks in R

Here's another alternative:

vec <- c( 1, 2, 3, 4, 7, 8, 9, 10, 15, 16, 17 )
split(vec, cumsum(seq_along(vec) %in% (which(diff(vec)>1)+1)))
# $`0`
# [1] 1 2 3 4
#
# $`1`
# [1] 7 8 9 10
#
# $`2`
# [1] 15 16 17

converting a list of integers into range in python

Using itertools.groupby() produces a concise but tricky implementation:

import itertools

def ranges(i):
for a, b in itertools.groupby(enumerate(i), lambda pair: pair[1] - pair[0]):
b = list(b)
yield b[0][1], b[-1][1]

print(list(ranges([0, 1, 2, 3, 4, 7, 8, 9, 11])))

Output:

[(0, 4), (7, 9), (11, 11)]

Detecting consecutive integers in a list

From the docs:

>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]

You can adapt this fairly easily to get a printed set of ranges.



Related Topics



Leave a reply



Submit