Rle-Like Function That Catches "Run" of Adjacent Integers

rle-like function that catches run of adjacent integers

1) Calculate values and then lengths based on values

s <- split(x, cumsum(c(0, diff(x) != 1)))
run.info <- list(lengths = unname(sapply(s, length)), values = unname(s))

Running it using x from the question gives this:

> str(run.info)
List of 2
$ lengths: int [1:5] 3 6 1 2 6
$ values :List of 5
..$ : num [1:3] 3 4 5
..$ : num [1:6] 10 11 12 13 14 15
..$ : num 17
..$ : num [1:2] 22 23
..$ : num [1:6] 35 36 37 38 39 40

2) Calculate lengths and then values based on lengths

Here is a second solution based on Gregor's length calculation:

lens <- rle(x - seq_along(x))$lengths 
list(lengths = lens, values = unname(split(x, rep(seq_along(lens), lens))))

3) Calculate lengths and values without using other

This one seems inefficient since it calculates each of lengths and values from scratch and it also seems somewhat overly complex but it does manage to get it all down to a single statement so I thought I would add it as well. Its basically just a mix of the prior two solutions marked 1) and 2) above. Nothing really new relative to those two.

list(lengths = rle(x - seq_along(x))$lengths,
values = unname(split(x, cumsum(c(0, diff(x) != 1)))))

EDIT: Added second solution.

EDIT: Added third solution.  

R: recursive function to give groups of consecutive numbers

Your sapply call is applying fun across all values of x, when you really want it to be applying across all values of i. To get the sapply to do what I assume you want to do, you can do the following:

sapply(X = 1:length(x), FUN = fun, x = x)

[1] 2 2 4 7 7 12 12 12 NA

Although it returns NA as the last value instead of 15. This is because I don't think your function is set up to handle the last value of a vector (there is no x[10], so it returns NA). You can probably edit your function to handle this fairly easily.

Collapse runs of consecutive numbers to ranges

I took some heavy inspiration from the answers in this question.

findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}

s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))

paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"

EDIT: Multiple solutions: benchmarking time!

Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000

@speendo's solution is the fastest at the moment, but none of these have been optimised yet.

Group integer vector into consecutive runs

Here's a brief answer using aggregate....

runs <- cumsum( c(0, diff(my.data$V2) > 1) )
aggregate(V2 ~ runs + V1, my.data, range)[,-1]


V1 V2.1 V2.2
1 1 2 5
2 1 7 11
3 1 13 13
4 2 4 9
5 2 11 13
6 3 1 6
7 3 101 105

Find longest consecutive number in R

Here's one possible solution

v <- c(1,2,10,41,42,43,50) # Your data
temp <- cumsum(c(1, diff(v) - 1))
temp2 <- rle(temp)
v[which(temp == with(temp2, values[which.max(lengths)]))]
# [1] 41 42 43

count the length of Number Sequences

dplyr. Set the default value and it will work:

df %>% mutate(check = x - lag(x, default = x[1L]) != 1) %>%
group_by(g = cumsum(check)) %>%
mutate(cnt = row_number()) %>%
ungroup %>% select(-g,-check)

x cnt
<dbl> <int>
1 2 1
2 4 1
3 5 2
4 6 3
5 8 1
6 10 1
7 11 2

data.table. Along the same lines and more concisely:

library(data.table)
setDT(df)

df[, cnt := 1:.N, by=cumsum(x != shift(x, fill=x[1L]) + 1L)]

x cnt
1: 2 1
2: 4 1
3: 5 2
4: 6 3
5: 8 1
6: 10 1
7: 11 2

shift is data.table's analogue to lag.

Alternately, from v1.9.7 of the package on, you're able to use rowid instead:

df[, cnt := rowid(cumsum(x != shift(x, fill=x[1L]) + 1L))]

How to find Run length encoding in python

You can do this with groupby

from itertools import groupby
ar = [2,2,2,1,1,2,2,3,3,3,3]
print([(k, sum(1 for i in g)) for k,g in groupby(ar)])
# [(2, 3), (1, 2), (2, 2), (3, 4)]

Removing Only Adjacent Duplicates in Data Frame in R

Try

 df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
# x y
#1 A 1
#2 B 2
#3 C 3
#4 A 4
#5 B 5
#6 C 6
#7 A 7
#9 B 9
#10 C 10

Explanation

Here, we are comparing an element with the element preceding it. This can be done by removing the first element from the column and that column compared with the column from which last element is removed (so that the lengths become equal)

 df$x[-1] #first element removed
#[1] B C A B C A B B C
df$x[-nrow(df)]
#[1] A B C A B C A B B #last element `C` removed

df$x[-1]!=df$x[-nrow(df)]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE

In the above, the length is 1 less than the nrow of df as we removed one element. Inorder to compensate that, we can concatenate a TRUE and then use this index for subsetting the dataset.



Related Topics



Leave a reply



Submit