Collapse consecutive runs of numbers to a string of ranges
Adding another alternative, you could use a deparse
ing approach. For example:
deparse(c(1L, 2L, 3L))
#[1] "1:3"
Taking advantage of as.character
"deparse"ing a given "list" as input, we could use:
as.character(split(as.integer(vec), cumsum(c(TRUE, diff(vec) != 1))))
#[1] "1:3" "5" "7:12"
toString(gsub(":", "-", .Last.value))
#[1] "1-3, 5, 7-12"
Collapse runs of consecutive numbers to ranges
I took some heavy inspiration from the answers in this question.
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
EDIT: Multiple solutions: benchmarking time!
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
@speendo's solution is the fastest at the moment, but none of these have been optimised yet.
Collapse continuous integer runs to strings of ranges
I think diff
is the solution. You might need some additional fiddling to deal with the singletons, but:
lapply(z, function(x) {
diffs <- c(1, diff(x))
start_indexes <- c(1, which(diffs > 1))
end_indexes <- c(start_indexes - 1, length(x))
coloned <- paste(x[start_indexes], x[end_indexes], sep=":")
paste0(coloned, collapse=", ")
})
$greg
[1] "7:11, 20:24, 30:33, 49:49"
$researcher
[1] "42:48"
$sally
[1] "25:29, 37:41"
$sam
[1] "1:6, 16:19, 34:36"
$teacher
[1] "12:15"
Dash between consecutive numbers
In base R using tapply
:
data = c(18,20:25,28:30)
result <- unlist(tapply(data, cumsum(c(FALSE, diff(data) > 1)), function(x)
c('-', x)), use.names = FALSE)[-1]
#[1] "18" "-" "20" "21" "22" "23" "24" "25" "-" "28" "29" "30"
In every group of consecutive numbers prepend the sequence with "-"
.
Collapse sequences of numbers into ranges
Doable. Let's see if this can be done with pandas.
import pandas as pd
data = ['10215', '10216', '10277', ...]
# Load data as series.
s = pd.Series(data)
# Find all consecutive rows with a difference of one
# and bin them into groups using `cumsum`.
v = s.astype(int).diff().bfill().ne(1).cumsum()
# Use `groupby` and `apply` to condense the consecutive numbers into ranges.
# This is only done if the group size is >1.
ranges = (
s.groupby(v).apply(
lambda x: '-'.join(x.values[[0, -1]]) if len(x) > 1 else x.item()).tolist())
print (ranges)
['10215-10216',
'10277-10282',
'10292-10293',
'10295-10326',
'10344',
'10399-10406',
'10415-10418',
'10430',
'10448',
'10492-10495',
'10574-10659',
'10707-10710',
'10792-10795',
'10908',
'10936-10939',
'11108-11155',
'11194-11235',
'10101-10102',
'10800',
'11236']
Your data must be sorted for this to work.
Related Topics
How to Use Custom Functions in Mutate (Dplyr)
Change Paper Size and Orientation in an Rmarkdown PDF
How to Make a Data Frame into a Simple Features Data Frame
Oauth Authentification to Fitbit Using Httr
Constrain Multiple Sliderinput in Shiny to Sum to 100
Ggplot2 Find Number of Counts in Histogram Maximum
Ggplot2 Y Axis Label Decimal Precision
R Map Switzerland According to Npa (Locality)
How to Sweep Specific Columns with Dplyr
Shiny: Open New Browser Tab from Within Shiny App
How to Convert by the Minute Data to Hourly Average Data
Want Only the Time Portion of a Date-Time Object in R
Plot Probability Heatmap/Hexbin with Different Sized Bins
Labelling Logarithmic Scale Display in R
Use a Custom Icon in Plotly's Pie Chart in R
R: How to Create a Vector of Functions