R - identify consecutive sequences
Here's an attempt using data.table
and stringi
First, I'm defining some helper function that will help me detect first accurances of B
per group and validate that they are followed by the correct sequence
Myfunc <- function(x) {
which(x == "B")[1L] ==
stri_locate_first_regex(paste(x, collapse = ""), 'B*CD')[, 1L]
}
Then, the implementation is straight forward
library(data.table)
library(stringi)
setDT(df)[, if(Myfunc(ROI)) .SD, by = .(subject, ntrial)]
# subject ntrial ROI
# 1: sbj05 78 A
# 2: sbj05 78 A
# 3: sbj05 78 A
# 4: sbj05 78 A
# 5: sbj05 78 A
# 6: sbj05 78 A
# 7: sbj05 78 B
# 8: sbj05 78 B
# 9: sbj05 78 C
# 10: sbj05 78 D
# 11: sbj05 78 E
# 12: sbj05 78 E
# 13: sbj05 78 E
# 14: sbj05 201 A
# 15: sbj05 201 A
# 16: sbj05 201 A
# 17: sbj05 201 A
# 18: sbj05 201 A
# 19: sbj05 201 B
# 20: sbj05 201 C
# 21: sbj05 201 D
# 22: sbj05 201 E
# 23: sbj05 201 E
# 24: sbj05 201 E
# 25: sbj05 201 F
# 26: sbj05 201 F
Or, if you just want an additional column you could do
setDT(df)[, output := +Myfunc(ROI), by = .(subject, ntrial)]
Find consecutive values in vector in R
Just use split
in conjunction with diff
:
> split(dat, cumsum(c(1, diff(dat) != 1)))
$`1`
[1] 1 2 3 4 5
$`2`
[1] 19 20 21
$`3`
[1] 56
$`4`
[1] 80 81
$`5`
[1] 92
Not exactly what you asked for, but the "R.utils" package has a couple of related fun functions:
library(R.utils)
seqToIntervals(dat)
# from to
# [1,] 1 5
# [2,] 19 21
# [3,] 56 56
# [4,] 80 81
# [5,] 92 92
seqToHumanReadable(dat)
# [1] "1-5, 19-21, 56, 80-81, 92"
In R: How can I check that I have consecutive years of data (to later be able to calculate growth)?
This is a good question!
- First
group_by
companyID
- calculate the difference of each consecutive row in
year
column withlag
to identify if year is consecutive. group_by
companyID, yearID)
mutate
helper columnsequence1
to apply 1 to each starting consecutive year in group.ungroup
and apply a sequence number eachtime 1
occurs insequence1
- remove column
sequence1
anddeltalag1
library(tidyverse)
df1 <- df %>%
group_by(companyID) %>%
mutate(deltaLag1 = year - lag(year, 1)) %>%
group_by(companyID, yearID) %>%
mutate(sequence1 = case_when(is.na(deltaLag1) | deltaLag1 > 1 ~ 1,
TRUE ~ 2)) %>%
ungroup() %>%
mutate(sequence = cumsum(sequence1==1)) %>%
select(-deltaLag1, -sequence1)
data
df <- tribble(
~companyID, ~year, ~yearID,
1, 2010, 1,
1, 2011, 2,
1, 2012, 3,
1, 2013, 4,
2, 2010, 1,
2, 2011, 2,
2, 2016, 3,
2, 2017, 4,
2, 2018, 5,
3, 2010, 1,
3, 2011, 2,
3, 2014, 3,
3, 2017, 4,
3, 2018, 5)
Assigning unique identifier to consecutive sequences of binomial values in R
This is effectively a run-length encoding, with a slight-twist (of zero-izing 0
s).
While data.table::rleid
does this well, if you are not already using that package, then we'll use
my_rleid <- function(x) { yy <- rle(x); rep(seq_along(yy$lengths), yy$lengths); }
From here, we'll see
x$out <- my_rleid(x$x)
x$out <- ifelse(x$x == 0, 0L, x$out)
x
# x goal out
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 1 1 2
# 6 1 1 2
# 7 1 1 2
# 8 1 1 2
# 9 1 1 2
# 10 0 0 0
# 11 0 0 0
# 12 1 2 4
# 13 1 2 4
# 14 1 2 4
# 15 1 2 4
# 16 0 0 0
# 17 0 0 0
# 18 0 0 0
# 19 0 0 0
# 20 0 0 0
# 21 0 0 0
# 22 0 0 0
# 23 0 0 0
# 24 0 0 0
# 25 0 0 0
# 26 1 3 6
# 27 1 3 6
# 28 1 3 6
which is pretty close. If you need consecutive numbers (no gaps like above), then
x$out <- match(x$out, sort(unique(x$out))) - (0 %in% x$out)
x
# x goal out
# 1 0 0 0
# 2 0 0 0
# 3 0 0 0
# 4 0 0 0
# 5 1 1 1
# 6 1 1 1
# 7 1 1 1
# 8 1 1 1
# 9 1 1 1
# 10 0 0 0
# 11 0 0 0
# 12 1 2 2
# 13 1 2 2
# 14 1 2 2
# 15 1 2 2
# 16 0 0 0
# 17 0 0 0
# 18 0 0 0
# 19 0 0 0
# 20 0 0 0
# 21 0 0 0
# 22 0 0 0
# 23 0 0 0
# 24 0 0 0
# 25 0 0 0
# 26 1 3 3
# 27 1 3 3
# 28 1 3 3
The reason I chose to use - (0 %in% x$out)
instead of a hard-coded 1
is that I wanted to guard against the possibility of there being no 0s in the data. Put differently, that (0 %in% x$out)
resolves to FALSE
or TRUE
, which when subtracted from integer
s, is coerced to 0L
or 1L
, respectively. The reason I need this: if there is a 0
in $out
, then match
will effectively be match(0, 0:6)
which will return 1
. We want the x == 0
matches to be 0L
, so we have to subtract one. Since the second argument (from sort(unique(.))
) is always either 0-based (as here) or 1-based (no zeroes present in x$x
), it's an easy adjustment.
If you are certain that this cannot be the case, and you don't like the - (.)
I appended to match(.)
, then you can change that to match(.) - 1L
.
Determine sequence of consecutive numbers in R
Is this what you're after?
split(y, cumsum(c(0, diff(y) > 4)));
#$`0`
#[1] 4 8 12 16
#
#$`1`
#[1] 24
#
#$`2`
#[1] 31 33
#
#$`3`
#[1] 39
#
#$`4`
#[1] 64 68 72 76 80
I don't see 24
in your list
; is that a mistake?
If you want to exclude list
entries with only one number, you can do everything in one line:
Filter(length, lapply(split(y, cumsum(c(0, diff(y) > 4))), function(x) x[length(x) > 1]));
#$`0`
#[1] 4 8 12 16
#
#$`2`
#[1] 31 33
#
#$`4`
#[1] 64 68 72 76 80
Find the length of consecutive numbers in R
Here is a base R solution.
Create a column, sequence
, which indicates which rows are contiguous.
data$sequence <- c(NA, head(data$position, -1)) + 1 == data$position
data$sequence[[1]] <- data$sequence[[2]]
data
#> position name sequence
#> 1 1 A TRUE
#> 2 2 B TRUE
#> 3 3 C TRUE
#> 4 1 A FALSE
#> 5 1 A FALSE
#> 6 4 D FALSE
#> 7 5 E TRUE
#> 8 6 F TRUE
#> 9 7 G TRUE
#> 10 8 H TRUE
#> 11 2 B FALSE
#> 12 2 B FALSE
Use rle
to construct the run lengths.
run_lengths <- rle(data$sequence)
i_ends <- cumsum(run_lengths$lengths)[run_lengths$values]
i_starts <- c(1, head(i_ends, -1))
data.frame(
position = paste0(data$position[i_starts], " - ", data$position[i_ends]),
length = i_ends - i_starts
)
#> position length
#> 1 1 - 3 2
#> 2 3 - 8 7
Identify groups of n consecutive numbers in a data.table field in a group
A solution using the tidyverse
.
library(tidyverse)
library(data.table)
DT2 <- DT %>%
arrange(Student, Month) %>%
group_by(Student) %>%
# Create sequence of 3
mutate(Seq = map(Month, ~seq.int(.x, .x + 2L))) %>%
# Create a flag to show if the sequence match completely with the Month column
mutate(Flag = map_lgl(Seq, ~all(.x %in% Month))) %>%
# Filter the Flag for TRUE
filter(Flag) %>%
# Remove columns
select(-Seq, -Flag) %>%
ungroup()
DT2
# # A tibble: 11 x 2
# Student Month
# <dbl> <dbl>
# 1 1 1
# 2 1 5
# 3 1 6
# 4 2 2
# 5 2 3
# 6 2 7
# 7 2 8
# 8 3 1
# 9 3 5
# 10 3 6
# 11 3 7
Find consecutive sequence of zeros in R
Using data.table
, as your question suggests you actually want to, as far I a can see, this is doing what you want
DT <- data.table(myOriginalDf)
# add the original order, so you can't lose it
DT[, orig := .I]
# rle by id, saving the length as a new variables
DT[, rleLength := {rr <- rle(value); rep(rr$length, rr$length)}, by = 'id']
# key by value and length to subset
setkey(DT, value, rleLength)
# which rows are value = 0 and length > 2
DT[list(0, unique(rleLength[rleLength>2])),nomatch=0]
## value rleLength id orig
## 1: 0 3 x 6
## 2: 0 3 x 7
## 3: 0 3 x 8
## 4: 0 4 y 10
## 5: 0 4 y 11
## 6: 0 4 y 12
## 7: 0 4 y 13
Related Topics
Grouped Bar Graph Custom Colours
How to Merge Two Data Frame Based on Partial String Match with R
R: How to Create Grid-Graphics
Reshape Data from Long to Wide Format - More Than One Variable
Do I Need to Reshape This Wide Data to Effectively Use Ggplot2
Error: Package or Namespace Load Failed for 'Car'
Using Italic() with a Variable in Ggplot2 Title Expression
Assign Color to 2 Different Geoms and Get 2 Different Legends
R: Reading a Binary File That Is Zipped
Interleave Columns of Two Data Frames
Ggplot Line Plot Different Colors for Sections
Increase Space Between Legend Keys Without Increasing Legend Keys
Bar Plot for Count Data by Group in R
How to Substitute Symbols in a Language Object