rle-like function that catches run of adjacent integers
1) Calculate values and then lengths based on values
s <- split(x, cumsum(c(0, diff(x) != 1)))
run.info <- list(lengths = unname(sapply(s, length)), values = unname(s))
Running it using x
from the question gives this:
> str(run.info)
List of 2
$ lengths: int [1:5] 3 6 1 2 6
$ values :List of 5
..$ : num [1:3] 3 4 5
..$ : num [1:6] 10 11 12 13 14 15
..$ : num 17
..$ : num [1:2] 22 23
..$ : num [1:6] 35 36 37 38 39 40
2) Calculate lengths and then values based on lengths
Here is a second solution based on Gregor's length calculation:
lens <- rle(x - seq_along(x))$lengths
list(lengths = lens, values = unname(split(x, rep(seq_along(lens), lens))))
3) Calculate lengths and values without using other
This one seems inefficient since it calculates each of lengths
and values
from scratch and it also seems somewhat overly complex but it does manage to get it all down to a single statement so I thought I would add it as well. Its basically just a mix of the prior two solutions marked 1) and 2) above. Nothing really new relative to those two.
list(lengths = rle(x - seq_along(x))$lengths,
values = unname(split(x, cumsum(c(0, diff(x) != 1)))))
EDIT: Added second solution.
EDIT: Added third solution.
R: recursive function to give groups of consecutive numbers
Your sapply call is applying fun
across all values of x
, when you really want it to be applying across all values of i
. To get the sapply to do what I assume you want to do, you can do the following:
sapply(X = 1:length(x), FUN = fun, x = x)
[1] 2 2 4 7 7 12 12 12 NA
Although it returns NA as the last value instead of 15. This is because I don't think your function is set up to handle the last value of a vector (there is no x[10], so it returns NA). You can probably edit your function to handle this fairly easily.
Collapse runs of consecutive numbers to ranges
I took some heavy inspiration from the answers in this question.
findIntRuns <- function(run){
rundiff <- c(1, diff(run))
difflist <- split(run, cumsum(rundiff!=1))
unlist(lapply(difflist, function(x){
if(length(x) %in% 1:2) as.character(x) else paste0(x[1], "-", x[length(x)])
}), use.names=FALSE)
}
s <- "1,2,3,4,8,9,14,15,16,19"
s2 <- as.numeric(unlist(strsplit(s, ",")))
paste0(findIntRuns(s2), collapse=",")
[1] "1-4,8,9,14-16,19"
EDIT: Multiple solutions: benchmarking time!
Unit: microseconds
expr min lq median uq max neval
spee() 277.708 295.517 301.5540 311.5150 1612.207 1000
seb() 294.611 313.025 321.1750 332.6450 1709.103 1000
marc() 672.835 707.549 722.0375 744.5255 2154.942 1000
@speendo's solution is the fastest at the moment, but none of these have been optimised yet.
Group integer vector into consecutive runs
Here's a brief answer using aggregate....
runs <- cumsum( c(0, diff(my.data$V2) > 1) )
aggregate(V2 ~ runs + V1, my.data, range)[,-1]
V1 V2.1 V2.2
1 1 2 5
2 1 7 11
3 1 13 13
4 2 4 9
5 2 11 13
6 3 1 6
7 3 101 105
Find longest consecutive number in R
Here's one possible solution
v <- c(1,2,10,41,42,43,50) # Your data
temp <- cumsum(c(1, diff(v) - 1))
temp2 <- rle(temp)
v[which(temp == with(temp2, values[which.max(lengths)]))]
# [1] 41 42 43
count the length of Number Sequences
dplyr. Set the default
value and it will work:
df %>% mutate(check = x - lag(x, default = x[1L]) != 1) %>%
group_by(g = cumsum(check)) %>%
mutate(cnt = row_number()) %>%
ungroup %>% select(-g,-check)
x cnt
<dbl> <int>
1 2 1
2 4 1
3 5 2
4 6 3
5 8 1
6 10 1
7 11 2
data.table. Along the same lines and more concisely:
library(data.table)
setDT(df)
df[, cnt := 1:.N, by=cumsum(x != shift(x, fill=x[1L]) + 1L)]
x cnt
1: 2 1
2: 4 1
3: 5 2
4: 6 3
5: 8 1
6: 10 1
7: 11 2
shift
is data.table's analogue to lag
.
Alternately, from v1.9.7 of the package on, you're able to use rowid
instead:
df[, cnt := rowid(cumsum(x != shift(x, fill=x[1L]) + 1L))]
How to find Run length encoding in python
You can do this with groupby
from itertools import groupby
ar = [2,2,2,1,1,2,2,3,3,3,3]
print([(k, sum(1 for i in g)) for k,g in groupby(ar)])
# [(2, 3), (1, 2), (2, 2), (3, 4)]
Removing Only Adjacent Duplicates in Data Frame in R
Try
df[with(df, c(x[-1]!= x[-nrow(df)], TRUE)),]
# x y
#1 A 1
#2 B 2
#3 C 3
#4 A 4
#5 B 5
#6 C 6
#7 A 7
#9 B 9
#10 C 10
Explanation
Here, we are comparing an element with the element preceding it. This can be done by removing the first element
from the column and that column compared with the column from which last element
is removed (so that the lengths become equal)
df$x[-1] #first element removed
#[1] B C A B C A B B C
df$x[-nrow(df)]
#[1] A B C A B C A B B #last element `C` removed
df$x[-1]!=df$x[-nrow(df)]
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
In the above, the length is 1
less than the nrow
of df
as we removed one element. Inorder to compensate that, we can concatenate a TRUE
and then use this index
for subsetting the dataset.
Related Topics
Control the Height in Fluidrow in R Shiny
How to Install a Package from a Download Zip File
How to Delete Rows from a Data.Frame, Based on an External List, Using R
How to Return Number of Decimal Places in R
One-Hot Encoding in [R] | Categorical to Dummy Variables
Use Ggpairs to Create This Plot
R: Lm() Result Differs When Using 'Weights' Argument and When Using Manually Reweighted Data
How to Redirect Console Output to a Variable
How to Add a Factor Column to Dataframe Based on a Conditional Statement from Another Column
R - When Trying to Install Package: Internetopenurl Failed
Rcpp Function Check If Missing Value
Rstudio Shiny List from Checking Rows in Datatables
R: How to Run Some Code on Load of Package
Sparse Matrix to a Data Frame in R
Ggplot Replace Count with Percentage in Geom_Bar
R Grep: Is There an and Operator