R - Identify Consecutive Sequences

R - identify consecutive sequences

Here's an attempt using data.table and stringi

First, I'm defining some helper function that will help me detect first accurances of B per group and validate that they are followed by the correct sequence

Myfunc <- function(x) {
which(x == "B")[1L] ==
stri_locate_first_regex(paste(x, collapse = ""), 'B*CD')[, 1L]
}

Then, the implementation is straight forward

library(data.table)
library(stringi)
setDT(df)[, if(Myfunc(ROI)) .SD, by = .(subject, ntrial)]
# subject ntrial ROI
# 1: sbj05 78 A
# 2: sbj05 78 A
# 3: sbj05 78 A
# 4: sbj05 78 A
# 5: sbj05 78 A
# 6: sbj05 78 A
# 7: sbj05 78 B
# 8: sbj05 78 B
# 9: sbj05 78 C
# 10: sbj05 78 D
# 11: sbj05 78 E
# 12: sbj05 78 E
# 13: sbj05 78 E
# 14: sbj05 201 A
# 15: sbj05 201 A
# 16: sbj05 201 A
# 17: sbj05 201 A
# 18: sbj05 201 A
# 19: sbj05 201 B
# 20: sbj05 201 C
# 21: sbj05 201 D
# 22: sbj05 201 E
# 23: sbj05 201 E
# 24: sbj05 201 E
# 25: sbj05 201 F
# 26: sbj05 201 F

Or, if you just want an additional column you could do

setDT(df)[, output := +Myfunc(ROI), by = .(subject, ntrial)]

Find consecutive values in vector in R

Just use split in conjunction with diff:

> split(dat, cumsum(c(1, diff(dat) != 1)))
$`1`
[1] 1 2 3 4 5

$`2`
[1] 19 20 21

$`3`
[1] 56

$`4`
[1] 80 81

$`5`
[1] 92

Not exactly what you asked for, but the "R.utils" package has a couple of related fun functions:

library(R.utils)
seqToIntervals(dat)
# from to
# [1,] 1 5
# [2,] 19 21
# [3,] 56 56
# [4,] 80 81
# [5,] 92 92
seqToHumanReadable(dat)
# [1] "1-5, 19-21, 56, 80-81, 92"

In R: How can I check that I have consecutive years of data (to later be able to calculate growth)?

This is a good question!

  1. First group_by companyID
  2. calculate the difference of each consecutive row in year column with lag to identify if year is consecutive.
  3. group_by companyID, yearID)
  4. mutate helper column sequence1 to apply 1 to each starting consecutive year in group.
  5. ungroup and apply a sequence number eachtime 1
    occurs in sequence1
  6. remove column sequence1 and deltalag1
library(tidyverse)

df1 <- df %>%
group_by(companyID) %>%
mutate(deltaLag1 = year - lag(year, 1)) %>%
group_by(companyID, yearID) %>%
mutate(sequence1 = case_when(is.na(deltaLag1) | deltaLag1 > 1 ~ 1,
TRUE ~ 2)) %>%
ungroup() %>%
mutate(sequence = cumsum(sequence1==1)) %>%
select(-deltaLag1, -sequence1)

data

df <- tribble(
~companyID, ~year, ~yearID,
1, 2010, 1,
1, 2011, 2,
1, 2012, 3,
1, 2013, 4,
2, 2010, 1,
2, 2011, 2,
2, 2016, 3,
2, 2017, 4,
2, 2018, 5,
3, 2010, 1,
3, 2011, 2,
3, 2014, 3,
3, 2017, 4,
3, 2018, 5)

Sample Image

Assigning unique identifier to consecutive sequences of binomial values in R

This is effectively a run-length encoding, with a slight-twist (of zero-izing 0s).

While data.table::rleid does this well, if you are not already using that package, then we'll use

my_rleid <- function(x) { yy <- rle(x); rep(seq_along(yy$lengths), yy$lengths); }

From here, we'll see

x$out <- my_rleid(x$x)
x$out <- ifelse(x$x == 0, 0L, x$out)
x
# x goal out
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 1 1 2
# 6 1 1 2
# 7 1 1 2
# 8 1 1 2
# 9 1 1 2
# 10 0 0 0
# 11 0 0 0
# 12 1 2 4
# 13 1 2 4
# 14 1 2 4
# 15 1 2 4
# 16 0 0 0
# 17 0 0 0
# 18 0 0 0
# 19 0 0 0
# 20 0 0 0
# 21 0 0 0
# 22 0 0 0
# 23 0 0 0
# 24 0 0 0
# 25 0 0 0
# 26 1 3 6
# 27 1 3 6
# 28 1 3 6

which is pretty close. If you need consecutive numbers (no gaps like above), then

x$out <- match(x$out, sort(unique(x$out))) - (0 %in% x$out)
x
# x goal out
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 1 1 1
# 6 1 1 1
# 7 1 1 1
# 8 1 1 1
# 9 1 1 1
# 10 0 0 0
# 11 0 0 0
# 12 1 2 2
# 13 1 2 2
# 14 1 2 2
# 15 1 2 2
# 16 0 0 0
# 17 0 0 0
# 18 0 0 0
# 19 0 0 0
# 20 0 0 0
# 21 0 0 0
# 22 0 0 0
# 23 0 0 0
# 24 0 0 0
# 25 0 0 0
# 26 1 3 3
# 27 1 3 3
# 28 1 3 3

The reason I chose to use - (0 %in% x$out) instead of a hard-coded 1 is that I wanted to guard against the possibility of there being no 0s in the data. Put differently, that (0 %in% x$out) resolves to FALSE or TRUE, which when subtracted from integers, is coerced to 0L or 1L, respectively. The reason I need this: if there is a 0 in $out, then match will effectively be match(0, 0:6) which will return 1. We want the x == 0 matches to be 0L, so we have to subtract one. Since the second argument (from sort(unique(.))) is always either 0-based (as here) or 1-based (no zeroes present in x$x), it's an easy adjustment.

If you are certain that this cannot be the case, and you don't like the - (.) I appended to match(.), then you can change that to match(.) - 1L.

Determine sequence of consecutive numbers in R

Is this what you're after?

split(y, cumsum(c(0, diff(y) > 4)));
#$`0`
#[1] 4 8 12 16
#
#$`1`
#[1] 24
#
#$`2`
#[1] 31 33
#
#$`3`
#[1] 39
#
#$`4`
#[1] 64 68 72 76 80

I don't see 24 in your list; is that a mistake?

If you want to exclude list entries with only one number, you can do everything in one line:

Filter(length, lapply(split(y, cumsum(c(0, diff(y) > 4))), function(x) x[length(x) > 1]));
#$`0`
#[1] 4 8 12 16
#
#$`2`
#[1] 31 33
#
#$`4`
#[1] 64 68 72 76 80

Find the length of consecutive numbers in R

Here is a base R solution.

Create a column, sequence, which indicates which rows are contiguous.

data$sequence <- c(NA, head(data$position, -1)) + 1 == data$position
data$sequence[[1]] <- data$sequence[[2]]

data
#> position name sequence
#> 1 1 A TRUE
#> 2 2 B TRUE
#> 3 3 C TRUE
#> 4 1 A FALSE
#> 5 1 A FALSE
#> 6 4 D FALSE
#> 7 5 E TRUE
#> 8 6 F TRUE
#> 9 7 G TRUE
#> 10 8 H TRUE
#> 11 2 B FALSE
#> 12 2 B FALSE

Use rle to construct the run lengths.

run_lengths <- rle(data$sequence)

i_ends <- cumsum(run_lengths$lengths)[run_lengths$values]
i_starts <- c(1, head(i_ends, -1))

data.frame(
position = paste0(data$position[i_starts], " - ", data$position[i_ends]),
length = i_ends - i_starts
)
#> position length
#> 1 1 - 3 2
#> 2 3 - 8 7

Identify groups of n consecutive numbers in a data.table field in a group

A solution using the tidyverse.

library(tidyverse)
library(data.table)

DT2 <- DT %>%
arrange(Student, Month) %>%
group_by(Student) %>%
# Create sequence of 3
mutate(Seq = map(Month, ~seq.int(.x, .x + 2L))) %>%
# Create a flag to show if the sequence match completely with the Month column
mutate(Flag = map_lgl(Seq, ~all(.x %in% Month))) %>%
# Filter the Flag for TRUE
filter(Flag) %>%
# Remove columns
select(-Seq, -Flag) %>%
ungroup()

DT2
# # A tibble: 11 x 2
# Student Month
# <dbl> <dbl>
# 1 1 1
# 2 1 5
# 3 1 6
# 4 2 2
# 5 2 3
# 6 2 7
# 7 2 8
# 8 3 1
# 9 3 5
# 10 3 6
# 11 3 7

Find consecutive sequence of zeros in R

Using data.table, as your question suggests you actually want to, as far I a can see, this is doing what you want

DT <- data.table(myOriginalDf)

# add the original order, so you can't lose it
DT[, orig := .I]

# rle by id, saving the length as a new variables

DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']

# key by value and length to subset

setkey(DT, value, rleLength)

# which rows are value = 0 and length > 2

DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]

## value rleLength id orig
## 1: 0 3 x 6
## 2: 0 3 x 7
## 3: 0 3 x 8
## 4: 0 4 y 10
## 5: 0 4 y 11
## 6: 0 4 y 12
## 7: 0 4 y 13


Related Topics



Leave a reply



Submit