Create Sequential Counter That Restarts on a Condition Within Panel Data Groups

Create sequential counter that restarts on a condition within panel data groups

With dplyr that would be:

df %>% 
group_by(country, idx = cumsum(event == 1L)) %>%
mutate(counter = row_number()) %>%
ungroup %>%
select(-idx)

#Source: local data frame [10 x 4]
#
# country year event counter
#1 A 2000 0 1
#2 A 2001 0 2
#3 A 2002 1 1
#4 A 2003 0 2
#5 A 2004 0 3
#6 B 2000 1 1
#7 B 2001 0 2
#8 B 2002 0 3
#9 B 2003 1 1
#10 B 2004 0 2

Or using data.table:

library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]

Edit: group_by(country, idx = cumsum(event == 1L)) is used to group by country and a new grouping index "idx". The event == 1L part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE/FALSE). Then, cumsum(...) sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.

Create sequential counter starting with event and zeros before event for groups in panel

You can use group_by(id) and cumsum(cummax(event)) to get close - produces 1...N starting where event==1. I wrap it in ifelse(...) to subtract 1 from those values that are > 0.

library(tidyverse)
df %>%
group_by(id) %>%
mutate(delta = ifelse(cumsum(cummax(event)) > 0, cumsum(cummax(event)) - 1, 0)) %>%
ungroup()

# A tibble: 18 x 4
# id year event delta
# <chr> <int> <dbl> <dbl>
# 1 1 1998 0. 0.
# 2 1 1999 0. 0.
# 3 1 2000 1. 0.
# 4 1 2001 0. 1.
# 5 1 2002 0. 2.
# 6 1 2003 0. 3.
# 7 2 1998 0. 0.
# 8 2 1999 0. 0.
# 9 2 2000 0. 0.
# 10 2 2001 0. 0.
# 11 2 2002 1. 0.
# 12 2 2003 0. 1.
# 13 3 1998 0. 0.
# 14 3 1999 1. 0.
# 15 3 2000 0. 1.
# 16 3 2001 0. 2.
# 17 3 2002 0. 3.
# 18 3 2003 0. 4.

Group counter that restarts (with R data.table)

You can achieve that with rleid function on y column grouped by x. rleid is a type of counter that increase each time there is a change and stay the same otherwise

library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4
C A 1 4
C A 1 4
C B 2 5
C C 3 6
C C 3 6
C D 4 7")

dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#> x y i d
#> 1: A B 1 1
#> 2: A B 1 1
#> 3: A C 2 2
#> 4: A D 3 3
#> 5: B A 1 1
#> 6: B A 1 1
#> 7: C A 1 1
#> 8: C A 1 1
#> 9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4

Created on 2018-06-03 by the reprex package (v0.2.0).

Sequence a column based on two other columns with a restarting sequence

1) dplyr Create a factor and extract its levels:

library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
mutate(Year_id = as.numeric(factor(year))) %>%
ungroup()

giving:

# A tibble: 10 x 3
name year Year_id
<chr> <int> <dbl>
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2001 2
9 B 2001 2
10 B 2001 2

1a) The mutate could alternately be written as mutate(Year_id = match(year, unique(year))) as per @nicola's comment.

2) no packages Without package it could be written:

o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))

or using match.

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

row id by group resetting after zero/null values

Thought I would post an answer to my rather specific question:

library(dplyr)
a2 <- a %>%
group_by(id) %>%
mutate(next.valuecolumn = lag(valuecolumn),
next.valuecolumn2 = coalesce(next.valuecolumn, valuecolumn),
diff = ifelse(valuecolumn > 0 & next.valuecolumn2 == 0, 1, 0),
target2 = cumsum(diff)+1)

The row id doesn't 'reset', but this is not required for the problem as I can group by user_id-target to sum value by id.

Continual summation of a column in R until condition is met

You can use :

library(dplyr)

df %>%
group_by(x1 = cumsum(replace(x, is.na(x), 0) == 0)) %>%
mutate(counter = (row_number() - 1) * x) %>%
ungroup %>%
select(-x1)

# x counter
# <dbl> <dbl>
# 1 NA NA
# 2 1 1
# 3 0 0
# 4 0 0
# 5 0 0
# 6 0 0
# 7 1 1
# 8 1 2
# 9 1 3
#10 1 4
#11 0 0
#12 1 1

Explaining the steps -

  • Create a new column (x1), replace NA in x with 0 and increment the group value by 1 (using cumsum) whenever x = 0.
  • For each group subtract the row number with 0 and multiply it by x. This multiplication is necessary because it will help to keep counter as 0 where x = 0 and counter as NA where x is NA.


Related Topics



Leave a reply



Submit