Create sequential counter that restarts on a condition within panel data groups
With dplyr
that would be:
df %>%
group_by(country, idx = cumsum(event == 1L)) %>%
mutate(counter = row_number()) %>%
ungroup %>%
select(-idx)
#Source: local data frame [10 x 4]
#
# country year event counter
#1 A 2000 0 1
#2 A 2001 0 2
#3 A 2002 1 1
#4 A 2003 0 2
#5 A 2004 0 3
#6 B 2000 1 1
#7 B 2001 0 2
#8 B 2002 0 3
#9 B 2003 1 1
#10 B 2004 0 2
Or using data.table
:
library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]
Edit: group_by(country, idx = cumsum(event == 1L))
is used to group by country and a new grouping index "idx". The event == 1L
part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE
/FALSE
). Then, cumsum(...)
sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.
Create sequential counter starting with event and zeros before event for groups in panel
You can use group_by(id)
and cumsum(cummax(event))
to get close - produces 1...N
starting where event==1
. I wrap it in ifelse(...)
to subtract 1 from those values that are > 0
.
library(tidyverse)
df %>%
group_by(id) %>%
mutate(delta = ifelse(cumsum(cummax(event)) > 0, cumsum(cummax(event)) - 1, 0)) %>%
ungroup()
# A tibble: 18 x 4
# id year event delta
# <chr> <int> <dbl> <dbl>
# 1 1 1998 0. 0.
# 2 1 1999 0. 0.
# 3 1 2000 1. 0.
# 4 1 2001 0. 1.
# 5 1 2002 0. 2.
# 6 1 2003 0. 3.
# 7 2 1998 0. 0.
# 8 2 1999 0. 0.
# 9 2 2000 0. 0.
# 10 2 2001 0. 0.
# 11 2 2002 1. 0.
# 12 2 2003 0. 1.
# 13 3 1998 0. 0.
# 14 3 1999 1. 0.
# 15 3 2000 0. 1.
# 16 3 2001 0. 2.
# 17 3 2002 0. 3.
# 18 3 2003 0. 4.
Group counter that restarts (with R data.table)
You can achieve that with rleid
function on y
column grouped by x
. rleid
is a type of counter that increase each time there is a change and stay the same otherwise
library(data.table)
tab <- fread("
x y i d
A B 1 1
A B 1 1
A C 2 2
A D 3 3
B A 1 4
B A 1 4
C A 1 4
C A 1 4
C B 2 5
C C 3 6
C C 3 6
C D 4 7")
dt <- tab[, .(x, y, i)]
dt[, d:= rleid(y), by = .(x)]
dt
#> x y i d
#> 1: A B 1 1
#> 2: A B 1 1
#> 3: A C 2 2
#> 4: A D 3 3
#> 5: B A 1 1
#> 6: B A 1 1
#> 7: C A 1 1
#> 8: C A 1 1
#> 9: C B 2 2
#> 10: C C 3 3
#> 11: C C 3 3
#> 12: C D 4 4
Created on 2018-06-03 by the reprex package (v0.2.0).
Sequence a column based on two other columns with a restarting sequence
1) dplyr Create a factor and extract its levels:
library(dplyr)
df %>%
arrange(name, year) %>%
group_by(name) %>%
mutate(Year_id = as.numeric(factor(year))) %>%
ungroup()
giving:
# A tibble: 10 x 3
name year Year_id
<chr> <int> <dbl>
1 A 2000 1
2 A 2000 1
3 A 2000 1
4 A 2001 2
5 A 2001 2
6 B 2000 1
7 B 2000 1
8 B 2001 2
9 B 2001 2
10 B 2001 2
1a) The mutate
could alternately be written as mutate(Year_id = match(year, unique(year)))
as per @nicola's comment.
2) no packages Without package it could be written:
o <- with(df, order(name, year))
transform(df[o, ], Year_id = ave(year, name, FUN = function(x) as.numeric(factor(x))))
or using match
.
Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
row id by group resetting after zero/null values
Thought I would post an answer to my rather specific question:
library(dplyr)
a2 <- a %>%
group_by(id) %>%
mutate(next.valuecolumn = lag(valuecolumn),
next.valuecolumn2 = coalesce(next.valuecolumn, valuecolumn),
diff = ifelse(valuecolumn > 0 & next.valuecolumn2 == 0, 1, 0),
target2 = cumsum(diff)+1)
The row id doesn't 'reset', but this is not required for the problem as I can group by user_id-target to sum value by id.
Continual summation of a column in R until condition is met
You can use :
library(dplyr)
df %>%
group_by(x1 = cumsum(replace(x, is.na(x), 0) == 0)) %>%
mutate(counter = (row_number() - 1) * x) %>%
ungroup %>%
select(-x1)
# x counter
# <dbl> <dbl>
# 1 NA NA
# 2 1 1
# 3 0 0
# 4 0 0
# 5 0 0
# 6 0 0
# 7 1 1
# 8 1 2
# 9 1 3
#10 1 4
#11 0 0
#12 1 1
Explaining the steps -
- Create a new column (
x1
), replaceNA
inx
with 0 and increment the group value by 1 (usingcumsum
) wheneverx = 0
. - For each group subtract the row number with 0 and multiply it by
x
. This multiplication is necessary because it will help to keepcounter
as 0 wherex = 0
andcounter
asNA
wherex
isNA
.
Related Topics
Label X Axis in Time Series Plot Using R
List of Word Frequencies Using R
Ggplot2 Legend for Stat_Summary
Possible to Create Latex Multicolumns in Xtable
Crop for Spatialpolygonsdataframe
Output Error/Warning Log (Txt File) When Running R Script Under Command Line
Avoiding the Infamous "Eval(Parse())" Construct
Merge Nearest Date, and Related Variables from a Another Dataframe by Group
Ggplot: Adding Regression Line Equation and R2 with Facet
Make R Exit with Non-Zero Status Code
Selecting a Subset of Columns in a Data.Table
How to Calculate the Probability for a Given Quantile in R
Filling in Missing (Blanks) in a Data Table, Per Category - Backwards and Forwards
Using R to Download Gzipped Data File, Extract, and Import Data