R - Identify Consecutive Sequences

R - identify consecutive sequences

Here's an attempt using data.table and stringi

First, I'm defining some helper function that will help me detect first accurances of B per group and validate that they are followed by the correct sequence

Myfunc <- function(x) {
               which(x == "B")[1L] == 
               stri_locate_first_regex(paste(x, collapse = ""), 'B*CD')[, 1L]
              }

Then, the implementation is straight forward

library(data.table)
library(stringi)
setDT(df)[, if(Myfunc(ROI)) .SD, by = .(subject, ntrial)]
#     subject ntrial ROI
#  1:   sbj05     78   A
#  2:   sbj05     78   A
#  3:   sbj05     78   A
#  4:   sbj05     78   A
#  5:   sbj05     78   A
#  6:   sbj05     78   A
#  7:   sbj05     78   B
#  8:   sbj05     78   B
#  9:   sbj05     78   C
# 10:   sbj05     78   D
# 11:   sbj05     78   E
# 12:   sbj05     78   E
# 13:   sbj05     78   E
# 14:   sbj05    201   A
# 15:   sbj05    201   A
# 16:   sbj05    201   A
# 17:   sbj05    201   A
# 18:   sbj05    201   A
# 19:   sbj05    201   B
# 20:   sbj05    201   C
# 21:   sbj05    201   D
# 22:   sbj05    201   E
# 23:   sbj05    201   E
# 24:   sbj05    201   E
# 25:   sbj05    201   F
# 26:   sbj05    201   F

Or, if you just want an additional column you could do

setDT(df)[, output := +Myfunc(ROI), by = .(subject, ntrial)]

Find consecutive values in vector in R

Just use split in conjunction with diff:

> split(dat, cumsum(c(1, diff(dat) != 1)))
$`1`
[1] 1 2 3 4 5

$`2`
[1] 19 20 21

$`3`
[1] 56

$`4`
[1] 80 81

$`5`
[1] 92

Not exactly what you asked for, but the "R.utils" package has a couple of related fun functions:

library(R.utils)
seqToIntervals(dat)
#      from to
# [1,]    1  5
# [2,]   19 21
# [3,]   56 56
# [4,]   80 81
# [5,]   92 92
seqToHumanReadable(dat)
# [1] "1-5, 19-21, 56, 80-81, 92"

In R: How can I check that I have consecutive years of data (to later be able to calculate growth)?

This is a good question!

First group_by companyID
calculate the difference of each consecutive row in year column with lag to identify if year is consecutive.
group_by companyID, yearID)
mutate helper column sequence1 to apply 1 to each starting consecutive year in group.
ungroup and apply a sequence number eachtime 1
occurs in sequence1
remove column sequence1 and deltalag1

library(tidyverse)

df1 <- df %>% 
  group_by(companyID) %>% 
  mutate(deltaLag1 = year - lag(year, 1)) %>% 
  group_by(companyID, yearID) %>% 
  mutate(sequence1 = case_when(is.na(deltaLag1) | deltaLag1 > 1 ~ 1,
                               TRUE ~ 2)) %>% 
  ungroup() %>% 
  mutate(sequence = cumsum(sequence1==1)) %>% 
  select(-deltaLag1, -sequence1)

data

df <- tribble(
~companyID,   ~year,   ~yearID,
1, 2010, 1, 
1, 2011, 2, 
1, 2012, 3, 
1, 2013, 4, 
2, 2010, 1, 
2, 2011, 2, 
2, 2016, 3, 
2, 2017, 4, 
2, 2018, 5, 
3, 2010, 1, 
3, 2011, 2, 
3, 2014, 3, 
3, 2017, 4, 
3, 2018, 5)

Sample Image

Assigning unique identifier to consecutive sequences of binomial values in R

This is effectively a run-length encoding, with a slight-twist (of zero-izing 0s).

While data.table::rleid does this well, if you are not already using that package, then we'll use

my_rleid <- function(x) { yy <- rle(x); rep(seq_along(yy$lengths), yy$lengths); }

From here, we'll see

x$out <- my_rleid(x$x)
x$out <- ifelse(x$x == 0, 0L, x$out)
x
#    x goal out
# 1  0    0   0
# 2  0    0   0
# 3  0    0   0
# 4  0    0   0
# 5  1    1   2
# 6  1    1   2
# 7  1    1   2
# 8  1    1   2
# 9  1    1   2
# 10 0    0   0
# 11 0    0   0
# 12 1    2   4
# 13 1    2   4
# 14 1    2   4
# 15 1    2   4
# 16 0    0   0
# 17 0    0   0
# 18 0    0   0
# 19 0    0   0
# 20 0    0   0
# 21 0    0   0
# 22 0    0   0
# 23 0    0   0
# 24 0    0   0
# 25 0    0   0
# 26 1    3   6
# 27 1    3   6
# 28 1    3   6

which is pretty close. If you need consecutive numbers (no gaps like above), then

x$out <- match(x$out, sort(unique(x$out))) - (0 %in% x$out)
x
#    x goal out
# 1  0    0   0
# 2  0    0   0
# 3  0    0   0
# 4  0    0   0
# 5  1    1   1
# 6  1    1   1
# 7  1    1   1
# 8  1    1   1
# 9  1    1   1
# 10 0    0   0
# 11 0    0   0
# 12 1    2   2
# 13 1    2   2
# 14 1    2   2
# 15 1    2   2
# 16 0    0   0
# 17 0    0   0
# 18 0    0   0
# 19 0    0   0
# 20 0    0   0
# 21 0    0   0
# 22 0    0   0
# 23 0    0   0
# 24 0    0   0
# 25 0    0   0
# 26 1    3   3
# 27 1    3   3
# 28 1    3   3

The reason I chose to use - (0 %in% x$out) instead of a hard-coded 1 is that I wanted to guard against the possibility of there being no 0s in the data. Put differently, that (0 %in% x$out) resolves to FALSE or TRUE, which when subtracted from integers, is coerced to 0L or 1L, respectively. The reason I need this: if there is a 0 in $out, then match will effectively be match(0, 0:6) which will return 1. We want the x == 0 matches to be 0L, so we have to subtract one. Since the second argument (from sort(unique(.))) is always either 0-based (as here) or 1-based (no zeroes present in x$x), it's an easy adjustment.

If you are certain that this cannot be the case, and you don't like the - (.) I appended to match(.), then you can change that to match(.) - 1L.

Determine sequence of consecutive numbers in R

Is this what you're after?

split(y, cumsum(c(0, diff(y) > 4)));
#$`0`
#[1]  4  8 12 16
#
#$`1`
#[1] 24
#
#$`2`
#[1] 31 33
#
#$`3`
#[1] 39
#
#$`4`
#[1] 64 68 72 76 80

I don't see 24 in your list; is that a mistake?

If you want to exclude list entries with only one number, you can do everything in one line:

Filter(length, lapply(split(y, cumsum(c(0, diff(y) > 4))), function(x) x[length(x) > 1]));
#$`0`
#[1]  4  8 12 16
#
#$`2`
#[1] 31 33
#
#$`4`
#[1] 64 68 72 76 80

Find the length of consecutive numbers in R

Here is a base R solution.

Create a column, sequence, which indicates which rows are contiguous.

data$sequence <- c(NA, head(data$position, -1)) + 1 == data$position
data$sequence[[1]] <- data$sequence[[2]]

data
#>    position name sequence
#> 1         1    A     TRUE
#> 2         2    B     TRUE
#> 3         3    C     TRUE
#> 4         1    A    FALSE
#> 5         1    A    FALSE
#> 6         4    D    FALSE
#> 7         5    E     TRUE
#> 8         6    F     TRUE
#> 9         7    G     TRUE
#> 10        8    H     TRUE
#> 11        2    B    FALSE
#> 12        2    B    FALSE

Use rle to construct the run lengths.

run_lengths <- rle(data$sequence)

i_ends <- cumsum(run_lengths$lengths)[run_lengths$values]
i_starts <- c(1, head(i_ends, -1))

data.frame(
  position = paste0(data$position[i_starts], " - ", data$position[i_ends]),
  length = i_ends - i_starts
)
#>   position length
#> 1    1 - 3      2
#> 2    3 - 8      7

Identify groups of n consecutive numbers in a data.table field in a group

A solution using the tidyverse.

library(tidyverse)
library(data.table)

DT2 <- DT %>%
  arrange(Student, Month) %>%
  group_by(Student) %>%
  # Create sequence of 3
  mutate(Seq = map(Month, ~seq.int(.x, .x + 2L))) %>%
  # Create a flag to show if the sequence match completely with the Month column 
  mutate(Flag = map_lgl(Seq, ~all(.x %in% Month))) %>%
  # Filter the Flag for TRUE
  filter(Flag) %>%
  # Remove columns
  select(-Seq, -Flag) %>%
  ungroup()

DT2
# # A tibble: 11 x 2
#    Student Month
#      <dbl> <dbl>
#  1       1     1
#  2       1     5
#  3       1     6
#  4       2     2
#  5       2     3
#  6       2     7
#  7       2     8
#  8       3     1
#  9       3     5
# 10       3     6
# 11       3     7

Find consecutive sequence of zeros in R

Using data.table, as your question suggests you actually want to, as far I a can see, this is doing what you want

DT <- data.table(myOriginalDf)

# add the original order, so you can't lose it
DT[, orig := .I]

# rle by id, saving the length as a new variables

DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']

# key by value and length to subset 

setkey(DT, value, rleLength)

# which rows are value = 0 and length > 2

DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]

##    value rleLength id orig
## 1:     0         3  x    6
## 2:     0         3  x    7
## 3:     0         3  x    8
## 4:     0         4  y   10
## 5:     0         4  y   11
## 6:     0         4  y   12
## 7:     0         4  y   13

R - Identify Consecutive Sequences