Group Integer Vector into Consecutive Runs

Group integer vector into consecutive runs

Here's a brief answer using aggregate....

runs <- cumsum( c(0, diff(my.data$V2) > 1) )
aggregate(V2 ~ runs + V1, my.data, range)[,-1]


V1 V2.1 V2.2
1 1 2 5
2 1 7 11
3 1 13 13
4 2 4 9
5 2 11 13
6 3 1 6
7 3 101 105

How to expand a vector of integers into consecutive integers in each group in r

This can be easily done via expand from tidyr,

library(tidyverse)

df %>%
group_by(group) %>%
expand(x = full_seq(x, 1))

Which gives,

# A tibble: 19 x 2
# Groups: group [3]
group x
<dbl> <dbl>
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 2 1
7 2 2
8 2 3
9 2 4
10 2 5
11 2 6
12 3 1
13 3 2
14 3 3
15 3 4
16 3 5
17 3 6
18 3 7
19 3 8

Get runs of consecutive integers of certain length and sample from first values

Here's an approach with base R:

First, we create all possible sub-vectors of length length. Next, we subset that list of vectors based on the cumsum of their difference equalling 1. The is.na test ensures the last vectors which contain NA are also filtered out. Then we just bind the remaining vectors into a matrix and sample the first column.

SampleSequencialVectors <- function(vec, length){
all.vecs <- lapply(seq_along(vec),function(x)vec[x:(x+(length-1))])
seq.vec <- all.vecs[sapply(all.vecs,function(x) all(diff(x) == 1 & !is.na(diff(x))))]
sample(do.call(rbind,seq.vec)[,1],1)
}

replicate(10, SampleSequencialVectors(v, 3))
# [1] 3 4 3 3 4 4 25 25 3 25

Or if you prefer a tidyverse type approach:

SampleSequencialVectorsPurrr <- function(vec, length){
vec %>%
seq_along %>%
purrr::map(~vec[.x:(.x+(length-1))]) %>%
purrr::keep(~ all(diff(.x) == 1 & !is.na(diff(.x)))) %>%
purrr::invoke(rbind,.) %>%
{sample(.[,1],size = 1)}
}
replicate(10, SampleSequencialVectorsPurrr(v, 3))
[1] 4 25 25 3 25 4 4 3 4 25

How to check if a vector contains n consecutive numbers

Using diff and rle, something like this should work:

result <- rle(diff(numbers))
any(result$lengths>=2 & result$values==1)
# [1] TRUE

In response to the comments below, my previous answer was specifically only testing for runs of length==3 excluding longer lengths. Changing the == to >= fixes this. It also works for runs involving negative numbers:

> numbers4 <- c(-2, -1, 0, 5, 7, 8)
> result <- rle(diff(numbers4))
> any(result$lengths>=2 & result$values==1)
[1] TRUE

Group rows based on consecutive line numbers

Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.

transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))

# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3

If you want to use dplyr :

library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))

Identify groups of n consecutive numbers in a data.table field in a group

A solution using the tidyverse.

library(tidyverse)
library(data.table)

DT2 <- DT %>%
arrange(Student, Month) %>%
group_by(Student) %>%
# Create sequence of 3
mutate(Seq = map(Month, ~seq.int(.x, .x + 2L))) %>%
# Create a flag to show if the sequence match completely with the Month column
mutate(Flag = map_lgl(Seq, ~all(.x %in% Month))) %>%
# Filter the Flag for TRUE
filter(Flag) %>%
# Remove columns
select(-Seq, -Flag) %>%
ungroup()

DT2
# # A tibble: 11 x 2
# Student Month
# <dbl> <dbl>
# 1 1 1
# 2 1 5
# 3 1 6
# 4 2 2
# 5 2 3
# 6 2 7
# 7 2 8
# 8 3 1
# 9 3 5
# 10 3 6
# 11 3 7

Select groups with only consecutive runs of a certain value


library(dplyr)
data |>
group_by(id) |>
filter(any(x[!is.na(x)] == 'yes' & lag(x[!is.na(x)]) == 'yes'))

# id x
# <dbl> <chr>
# 1 1 NA
# 2 1 yes
# 3 1 NA
# 4 1 yes
# 5 1 NA
# 6 1 NA
# 7 2 NA
# 8 2 NA
# 9 2 yes
# 10 2 yes
# 11 2 NA

Data:

data <- data.frame(id = c(1,1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5),
x = c(NA,'yes',NA,'yes',NA,NA,NA,NA,'yes','yes',NA,'no', 'no',NA,NA,'yes',
'no','yes','no','yes','no', 'yes',NA, 'no','yes', 'no'))

How to create a consecutive group number

Try Data$number <- as.numeric(as.factor(Data$site))

On a sidenote : the difference between the solution of me and @Chase on one hand, and the one of @DWin on the other, is the ordering of the numbers. Both as.factor and factor will automatically sort the levels, whereas that doesn't happen in the solution of @DWin :

Dat <- data.frame(site = rep(c(1,8,4), each = 3), score = runif(9))

Dat$number <- as.numeric(factor(Dat$site))
Dat$sitenum <- match(Dat$site, unique(Dat$site) )

Gives

> Dat
site score number sitenum
1 1 0.7377561 1 1
2 1 0.3131139 1 1
3 1 0.7862290 1 1
4 8 0.4480387 3 2
5 8 0.3873210 3 2
6 8 0.8778102 3 2
7 4 0.6916340 2 3
8 4 0.3033787 2 3
9 4 0.6552808 2 3


Related Topics



Leave a reply



Submit