Create a Sequential Number (Counter) For Rows Within Each Group of a Dataframe

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

Create a sequential number within each group

A simple solution with Base R:

df$seq <- ave(sapply(df$gap, identical, "gap"), df$id, FUN = cumsum)
df
#> id date lc lon lat gap_days gap seq
#> 1 20162.03 2003-10-19 14:33:00 Tagging -39.370 -18.480 NA <NA> 0
#> 2 20162.03 2003-10-21 12:19:00 1 -38.517 -18.253 1.90694444 gap 1
#> 3 20162.03 2003-10-21 13:33:00 1 -38.464 -18.302 0.05138889 no 1
#> 4 20162.03 2003-10-21 16:38:00 A -38.461 -18.425 0.12847222 no 1
#> 5 20162.03 2003-10-21 18:50:00 A -38.322 -18.512 0.09166667 no 1
#> 6 20162.03 2003-10-23 10:33:00 B -38.674 -19.824 1.65486111 gap 2
#> 7 20162.03 2003-10-23 17:52:00 B -38.957 -19.511 0.30486111 no 2
#> 8 20162.03 2003-11-02 08:14:00 B -42.084 -24.071 9.59861111 gap 3
#> 9 20162.03 2003-11-02 09:36:00 A -41.999 -24.114 0.05694444 no 3
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460 NA <NA> 0
#> 11 20687.03 2003-10-27 19:44:00 2 -39.306 -18.454 0.11250000 no 0
#> 12 20687.03 2003-10-27 21:05:00 1 -39.301 -18.458 0.05625000 no 0

And then split it:

split(df, list(df$id, df$seq), drop = TRUE)
#> $`20162.03.0`
#> id date lc lon lat gap_days gap seq
#> 1 20162.03 2003-10-19 14:33:00 Tagging -39.37 -18.48 NA <NA> 0
#>
#> $`20687.03.0`
#> id date lc lon lat gap_days gap seq
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460 NA <NA> 0
#> 11 20687.03 2003-10-27 19:44:00 2 -39.306 -18.454 0.11250 no 0
#> 12 20687.03 2003-10-27 21:05:00 1 -39.301 -18.458 0.05625 no 0
#>
#> $`20162.03.1`
#> id date lc lon lat gap_days gap seq
#> 2 20162.03 2003-10-21 12:19:00 1 -38.517 -18.253 1.90694444 gap 1
#> 3 20162.03 2003-10-21 13:33:00 1 -38.464 -18.302 0.05138889 no 1
#> 4 20162.03 2003-10-21 16:38:00 A -38.461 -18.425 0.12847222 no 1
#> 5 20162.03 2003-10-21 18:50:00 A -38.322 -18.512 0.09166667 no 1
#>
#> $`20162.03.2`
#> id date lc lon lat gap_days gap seq
#> 6 20162.03 2003-10-23 10:33:00 B -38.674 -19.824 1.6548611 gap 2
#> 7 20162.03 2003-10-23 17:52:00 B -38.957 -19.511 0.3048611 no 2
#>
#> $`20162.03.3`
#> id date lc lon lat gap_days gap seq
#> 8 20162.03 2003-11-02 08:14:00 B -42.084 -24.071 9.59861111 gap 3
#> 9 20162.03 2003-11-02 09:36:00 A -41.999 -24.114 0.05694444 no 3

How to add sequential counter column on groups using Pandas groupby?

Try groupby with transform:

x = df.groupby('c1')['c2']
df['Ct_X'] = x.transform(lambda x: x.eq('X').cumsum())
df['Ct_Y'] = x.transform(lambda x: x.eq('Y').cumsum())
print(df)

Output:

   c1 c2  seq  Ct_X  Ct_Y
0 A X 1 1 0
1 A X 2 2 0
2 A Y 1 2 1
3 A Y 2 2 2
4 B X 1 1 0
5 B X 2 2 0
6 B X 3 3 0
7 B Y 1 3 1
8 C X 1 1 0
9 C Y 1 1 1
10 C Y 2 1 2
11 C Y 6 1 3

r - Create a sequence number for each row within a category (defined by 2+ fields) in a dataframe

Same idea as @Procrastinatus Maximus's rleid, here is a dplyr version of it:

library(dplyr)
df %>%
arrange(personid, date) %>%
group_by(personid) %>%
mutate(id = cumsum(date != lag(date, default = first(date))) + 1)
# +1 converts the zero based id to one based id here

# Source: local data frame [6 x 4]
# Groups: personid [3]
#
# personid date measurement id
# <int> <fctr> <int> <dbl>
# 1 1 x 23 1
# 2 1 x 32 1
# 3 2 y 21 1
# 4 3 x 23 1
# 5 3 y 23 2
# 6 3 z 23 3

In order for rleid or cumsum to work here, we have to sort the data frame by personid and then date since both methods only care about adjacent values.

Pandas row number for group results

You can use pd.factorize per student group:

df['enrollment'] = df.groupby('student')['year'] \
.transform(lambda x: pd.factorize(x)[0] + 1)
print(df)

# Output:
student year enrollment
0 A 20211 1
1 A 20222 2
2 A 20222 2
3 A 20225 3
4 B 20211 1
5 B 20211 1
6 B 20227 2
7 C 20211 1
8 C 20222 2
9 C 20229 3

How to add sequential counter column on groups using Pandas groupby

use cumcount(), see docs here

In [4]: df.groupby(['c1', 'c2']).cumcount()
Out[4]:
0 0
1 1
2 0
3 1
4 0
5 1
6 2
7 0
8 0
9 0
10 1
11 2
dtype: int64

If you want orderings starting at 1

In [5]: df.groupby(['c1', 'c2']).cumcount()+1
Out[5]:
0 1
1 2
2 1
3 2
4 1
5 2
6 3
7 1
8 1
9 1
10 2
11 3
dtype: int64


Related Topics



Leave a reply



Submit