## Create a list of vectors of consecutive values from a vector

An option is to `split`

by creating a grouping variable created by checking the `diff`

erence of adjacent elements

`split(vec, cumsum(c(TRUE, diff(vec) != 1)))`

#$`1`

#[1] 1 2

#$`2`

#[1] 5

#$`3`

#[1] 7 8 9

#$`4`

#[1] 11 12 13

#$`5`

#[1] 15

## Finding the number of consecutive days in data

You could do:

`df %>% `

group_by(cumsum(c(0, diff(day) - 1))) %>%

summarise(sequences = paste(first(day), last(day), sep = ' - '),

length = n()) %>%

filter(length > 1) %>%

select(sequences, length)

#> # A tibble: 2 x 2

#> sequences length

#> <chr> <int>

#> 1 2022-01-03 - 2022-01-05 3

#> 2 2022-01-10 - 2022-01-13 4

## Group data frame row by consecutive value in R

We could use `diff`

on the adjacent values of 'time', check if the difference is not equal to 1, then change the logical vector to numeric index by taking the cumulative sum (`cumsum`

) so that there is an increment of 1 at each TRUE value

`library(dplyr)`

df1 %>%

mutate(grp = cumsum(c(TRUE, diff(time) != 1)))

-output

`# A tibble: 12 x 2`

time grp

<dbl> <int>

1 1 1

2 2 1

3 3 1

4 4 1

5 5 1

6 10 2

7 11 2

8 20 3

9 30 4

10 31 4

11 32 4

12 40 5

## Group rows based on consecutive line numbers

Convert the numbers to numeric, calculate difference between consecutive numbers and increment the group count when the difference is greater than 1.

`transform(df, group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))`

# line group

#1 0001 1

#2 0002 1

#3 0003 1

#4 0011 2

#5 0012 2

#6 0234 3

#7 0235 3

#8 0236 3

If you want to use `dplyr`

:

`library(dplyr)`

df %>% mutate(group = cumsum(c(TRUE, diff(as.numeric(line)) > 1)))

## Split a vector by its sequences

`split(x, cumsum(c(TRUE, diff(x)!=1)))`

#$`1`

#[1] 7

#

#$`2`

#[1] 1 2 3 4

#

#$`3`

#[1] 6 7

#

#$`4`

#[1] 9

## Create ID for specific sequence of consecutive days based on grouping variable in R

Try:

`library(dplyr)`

mydata %>%

group_by(country) %>%

distinct(seq.ID = cumsum(event_date != lag(event_date, default = first(event_date)) + 1L)

Output:

`# A tibble: 5 x 2`

# Groups: country [2]

seq.ID country

<int> <fct>

1 1 Angola

2 2 Angola

3 1 Benin

4 2 Benin

5 3 Benin

You can also use the `.keep_all`

argument in `distinct`

and preserve the first date of each sequence:

`mydata %>%`

group_by(country) %>%

distinct(seq.ID = cumsum(event_date != lag(event_date, default = first(event_date)) + 1L),

.keep_all = TRUE)

# A tibble: 5 x 3

# Groups: country [2]

country event_date seq.ID

<fct> <date> <int>

1 Angola 2017-06-16 1

2 Angola 2017-08-22 2

3 Benin 2019-04-18 1

4 Benin 2018-03-15 2

5 Benin 2016-03-17 3

In case of desired non-aggregated output with different sequence IDs, you could do:

`mydata %>%`

mutate(

seq.ID = cumsum(

(event_date != lag(event_date, default = first(event_date)) + 1L) |

country != lag(country, default = first(country))

)

)

country event_date seq.ID

1 Angola 2017-06-16 1

2 Angola 2017-06-17 1

3 Angola 2017-06-18 1

4 Angola 2017-08-22 2

5 Angola 2017-08-23 2

6 Benin 2019-04-18 3

7 Benin 2019-04-19 3

8 Benin 2019-04-20 3

9 Benin 2018-03-15 4

10 Benin 2018-03-16 4

11 Benin 2016-03-17 5

Note that there is a typo in your last `event_date`

, this is why the outputs don't correspond 100% to your desired output.

