Collapse continuous integer runs to strings of ranges
I think diff
is the solution. You might need some additional fiddling to deal with the singletons, but:
lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})
$greg
[1] "7:11, 20:24, 30:33, 49:49"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"
Collapse consecutive runs of numbers to a string of ranges
Adding another alternative, you could use a deparse
ing approach. For example:
deparse(c(1L, 2L, 3L))
#[1] "1:3"
Taking advantage of as.character
"deparse"ing a given "list" as input, we could use:
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"
Collapse runs of consecutive numbers to ranges
I took some heavy inspiration from the answers in this question.
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
EDIT: Multiple solutions: benchmarking time!
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
@speendo's solution is the fastest at the moment, but none of these have been optimised yet.
R reducing a numeric vector for reporting
One option is to split
the vector
by creatng a grouping column based on the diff
erence of adjacent elements, convert to logical, taking the cumsum
of the logical vector, and then loop over the list
with sapply
, paste
the range
(or min/max
values), and finally paste
with the prefix string
paste0("Missing data: rows ", toString(sapply(split(x,
cumsum(c(TRUE, diff(x) != 1))), function(x) paste0(min(x), ":", max(x)))))
#[1] "Missing data: rows 1:10, 20:30, 55:57"
Collapse sequences of numbers into ranges
Doable. Let's see if this can be done with pandas.
import pandas as pd
data = ['10215', '10216', '10277', ...]
# Load data as series.
s = pd.Series(data)
# Find all consecutive rows with a difference of one
# and bin them into groups using `cumsum`.
v = s.astype(int).diff().bfill().ne(1).cumsum()
# Use `groupby` and `apply` to condense the consecutive numbers into ranges.
# This is only done if the group size is >1.
ranges = (
s.groupby(v).apply(
lambda x: '-'.join(x.values[[0, -1]]) if len(x) > 1 else x.item()).tolist())
print (ranges)
['10215-10216',
'10277-10282',
'10292-10293',
'10295-10326',
'10344',
'10399-10406',
'10415-10418',
'10430',
'10448',
'10492-10495',
'10574-10659',
'10707-10710',
'10792-10795',
'10908',
'10936-10939',
'11108-11155',
'11194-11235',
'10101-10102',
'10800',
'11236']
Your data must be sorted for this to work.
Split a numeric vector into continuous chunks in R
Here's another alternative:
vec <- c( 1, 2, 3, 4, 7, 8, 9, 10, 15, 16, 17 )
split(vec, cumsum(seq_along(vec) %in% (which(diff(vec)>1)+1)))
# $`0`
# [1] 1 2 3 4
#
# $`1`
# [1] 7 8 9 10
#
# $`2`
# [1] 15 16 17
converting a list of integers into range in python
Using itertools.groupby()
produces a concise but tricky implementation:
import itertools
def ranges(i):
for a, b in itertools.groupby(enumerate(i), lambda pair: pair[1] - pair[0]):
b = list(b)
yield b[0][1], b[-1][1]
print(list(ranges([0, 1, 2, 3, 4, 7, 8, 9, 11])))
Output:
[(0, 4), (7, 9), (11, 11)]
Detecting consecutive integers in a list
From the docs:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28]
>>> for k, g in groupby(enumerate(data), lambda (i, x): i-x):
... print map(itemgetter(1), g)
...
[1]
[4, 5, 6]
[10]
[15, 16, 17, 18]
[22]
[25, 26, 27, 28]
You can adapt this fairly easily to get a printed set of ranges.
Related Topics
How to Color Sliderbar (Sliderinput)
Function to Split a Matrix into Sub-Matrices in R
How to Do Range Grouping on a Column Using Dplyr
Plot a Line Chart with Conditional Colors Depending on Values
Rolling Sum by Another Variable in R
Ggplot2, Axis Not Showing After Using Theme(Axis.Line=Element_Line())
How to Create a Grouped Boxplot in R
Extreme Numerical Values in Floating-Point Precision in R
Too Few Periods for Decompose()
Ordering of Points in R Lines Plot
Returning Above and Below Rows of Specific Rows in R Dataframe
R's Read.CSV Prepending 1St Column Name with Junk Text
Error in Loading Rgl Package with MAC Os X
How to Pivot/Unpivot (Cast/Melt) Data Frame
Add Percentage Labels to a Stacked Barplot