Create a list of vectors of consecutive values from a vector
An option is to split
by creating a grouping variable created by checking the diff
erence of adjacent elements
split(vec, cumsum(c(TRUE, diff(vec) != 1)))
#$`1`
#[1] 1 2
#$`2`
#[1] 5
#$`3`
#[1] 7 8 9
#$`4`
#[1] 11 12 13
#$`5`
#[1] 15
Finding the number of consecutive days in data
You could do:
df %>%
group_by(cumsum(c(0, diff(day) - 1))) %>%
summarise(sequences = paste(first(day), last(day), sep = ' - '),
length = n()) %>%
filter(length > 1) %>%
select(sequences, length)
#> # A tibble: 2 x 2
#> sequences length
#> <chr> <int>
#> 1 2022-01-03 - 2022-01-05 3
#> 2 2022-01-10 - 2022-01-13 4
Group data frame row by consecutive value in R
We could use diff
on the adjacent values of 'time', check if the difference is not equal to 1, then change the logical vector to numeric index by taking the cumulative sum (cumsum
) so that there is an increment of 1 at each TRUE value
library(dplyr)
df1 %>%
mutate(grp = cumsum(c(TRUE, diff(time) != 1)))
-output
# A tibble: 12 x 2
time grp
<dbl> <int>
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 10 2
7 11 2
8 20 3
9 30 4
10 31 4
11 32 4
12 40 5
Group rows based on consecutive line numbers
Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.
transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
# line group
#1 0001 1
#2 0002 1
#3 0003 1
#4 0011 2
#5 0012 2
#6 0234 3
#7 0235 3
#8 0236 3
If you want to use dplyr
:
library(dplyr)
df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))
Split a vector by its sequences
split(x, cumsum(c(TRUE, diff(x)!=1)))
#$`1`
#[1] 7
#
#$`2`
#[1] 1 2 3 4
#
#$`3`
#[1] 6 7
#
#$`4`
#[1] 9
Create ID for specific sequence of consecutive days based on grouping variable in R
Try:
library(dplyr)
mydata %>%
group_by(country) %>%
distinct(seq.ID = cumsum(event_date != lag(event_date, default = first(event_date)) + 1L)
Output:
# A tibble: 5 x 2
# Groups: country [2]
seq.ID country
<int> <fct>
1 1 Angola
2 2 Angola
3 1 Benin
4 2 Benin
5 3 Benin
You can also use the .keep_all
argument in distinct
and preserve the first date of each sequence:
mydata %>%
group_by(country) %>%
distinct(seq.ID = cumsum(event_date != lag(event_date, default = first(event_date)) + 1L),
.keep_all = TRUE)
# A tibble: 5 x 3
# Groups: country [2]
country event_date seq.ID
<fct> <date> <int>
1 Angola 2017-06-16 1
2 Angola 2017-08-22 2
3 Benin 2019-04-18 1
4 Benin 2018-03-15 2
5 Benin 2016-03-17 3
In case of desired non-aggregated output with different sequence IDs, you could do:
mydata %>%
mutate(
seq.ID = cumsum(
(event_date != lag(event_date, default = first(event_date)) + 1L) |
country != lag(country, default = first(country))
)
)
country event_date seq.ID
1 Angola 2017-06-16 1
2 Angola 2017-06-17 1
3 Angola 2017-06-18 1
4 Angola 2017-08-22 2
5 Angola 2017-08-23 2
6 Benin 2019-04-18 3
7 Benin 2019-04-19 3
8 Benin 2019-04-20 3
9 Benin 2018-03-15 4
10 Benin 2018-03-16 4
11 Benin 2016-03-17 5
Note that there is a typo in your last event_date
, this is why the outputs don't correspond 100% to your desired output.
Related Topics
Opposite of %In%: Exclude Rows With Values Specified in a Vector
Count Number of Occurences For Each Unique Value
Warning Message: in '...': Invalid Factor Level, Na Generated
Order a "Mixed" Vector (Numbers With Letters)
Force R Not to Use Exponential Notation (E.G. E+10)
How to Delete a Row by Reference in Data.Table
Generating All Distinct Permutations of a List in R
How to Plot Two Histograms Together in R
Ggplot Does Not Work If It Is Inside a For Loop Although It Works Outside of It
Splitting a Dataframe String Column into Multiple Different Columns
Split Character Column into Several Binary (0/1) Columns
Find All Combinations of a Set of Numbers That Add Up to a Certain Total
Is the "*Apply" Family Really Not Vectorized
How to Append a Sequential Number for Every Element in a Data Frame
How to Split Data into Training/Testing Sets Using Sample Function