Create Sequential Counter That Restarts on a Condition Within Panel Data Groups

Create sequential counter that restarts on a condition within panel data groups

With dplyr that would be:

df %>% 
  group_by(country, idx = cumsum(event == 1L)) %>% 
  mutate(counter = row_number()) %>% 
  ungroup %>% 
  select(-idx)

#Source: local data frame [10 x 4]
#
#   country year event counter
#1        A 2000     0       1
#2        A 2001     0       2
#3        A 2002     1       1
#4        A 2003     0       2
#5        A 2004     0       3
#6        B 2000     1       1
#7        B 2001     0       2
#8        B 2002     0       3
#9        B 2003     1       1
#10       B 2004     0       2

Or using data.table:

library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]

Edit: group_by(country, idx = cumsum(event == 1L)) is used to group by country and a new grouping index "idx". The event == 1L part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE/FALSE). Then, cumsum(...) sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.

Create sequential counter starting with event and zeros before event for groups in panel

You can use group_by(id) and cumsum(cummax(event)) to get close - produces 1...N starting where event==1. I wrap it in ifelse(...) to subtract 1 from those values that are > 0.

library(tidyverse)
df %>%
  group_by(id) %>%
  mutate(delta = ifelse(cumsum(cummax(event)) > 0, cumsum(cummax(event)) - 1, 0)) %>%
  ungroup()

# A tibble: 18 x 4
   # id     year event delta
   # <chr> <int> <dbl> <dbl>
 # 1 1      1998    0.    0.
 # 2 1      1999    0.    0.
 # 3 1      2000    1.    0.
 # 4 1      2001    0.    1.
 # 5 1      2002    0.    2.
 # 6 1      2003    0.    3.
 # 7 2      1998    0.    0.
 # 8 2      1999    0.    0.
 # 9 2      2000    0.    0.
# 10 2      2001    0.    0.
# 11 2      2002    1.    0.
# 12 2      2003    0.    1.
# 13 3      1998    0.    0.
# 14 3      1999    1.    0.
# 15 3      2000    0.    1.
# 16 3      2001    0.    2.
# 17 3      2002    0.    3.
# 18 3      2003    0.    4.

Group counter that restarts (with R data.table)

You can achieve that with rleid function on y column grouped by x. rleid is a type of counter that increase each time there is a change and stay the same otherwise

library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4 
C A 1 4
C A 1 4 
C B 2 5
C C 3 6
C C 3 6
C D 4 7")

dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#>     x y i d
#>  1: A B 1 1
#>  2: A B 1 1
#>  3: A C 2 2
#>  4: A D 3 3
#>  5: B A 1 1
#>  6: B A 1 1
#>  7: C A 1 1
#>  8: C A 1 1
#>  9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4

Created on 2018-06-03 by the reprex package (v0.2.0).

Sequence a column based on two other columns with a restarting sequence

1) dplyr Create a factor and extract its levels:

library(dplyr)
df %>% 
   arrange(name, year) %>% 
   group_by(name) %>%
   mutate(Year_id = as.numeric(factor(year))) %>%
   ungroup()

giving:

# A tibble: 10 x 3
    name  year Year_id
   <chr> <int>   <dbl>
1      A  2000       1
2      A  2000       1
3      A  2000       1
4      A  2001       2
5      A  2001       2
6      B  2000       1
7      B  2000       1
8      B  2001       2
9      B  2001       2
10     B  2001       2

1a) The mutate could alternately be written as mutate(Year_id = match(year, unique(year))) as per @nicola's comment.

2) no packages Without package it could be written:

o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))

or using match.

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

row id by group resetting after zero/null values

Thought I would post an answer to my rather specific question:

library(dplyr)
a2 <- a %>%
  group_by(id) %>%
  mutate(next.valuecolumn = lag(valuecolumn),
         next.valuecolumn2 = coalesce(next.valuecolumn, valuecolumn),
         diff = ifelse(valuecolumn > 0 & next.valuecolumn2 == 0, 1, 0),
         target2 = cumsum(diff)+1)

The row id doesn't 'reset', but this is not required for the problem as I can group by user_id-target to sum value by id.

Continual summation of a column in R until condition is met

You can use :

library(dplyr)

df %>%
  group_by(x1 = cumsum(replace(x, is.na(x), 0) == 0)) %>%
  mutate(counter = (row_number() - 1) * x) %>%
  ungroup %>%
  select(-x1)

#       x counter
#   <dbl>   <dbl>
# 1    NA      NA
# 2     1       1
# 3     0       0
# 4     0       0
# 5     0       0
# 6     0       0
# 7     1       1
# 8     1       2
# 9     1       3
#10     1       4
#11     0       0
#12     1       1

Explaining the steps -

Create a new column (x1), replace NA in x with 0 and increment the group value by 1 (using cumsum) whenever x = 0.
For each group subtract the row number with 0 and multiply it by x. This multiplication is necessary because it will help to keep counter as 0 where x = 0 and counter as NA where x is NA.

Create Sequential Counter That Restarts on a Condition Within Panel Data Groups