Create a Sequential Number (Counter) For Rows Within Each Group of a Dataframe

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

Create a sequential number within each group

A simple solution with Base R:

df$seq <- ave(sapply(df$gap, identical, "gap"), df$id, FUN = cumsum)
df
#>          id                date      lc     lon     lat   gap_days  gap seq
#> 1  20162.03 2003-10-19 14:33:00 Tagging -39.370 -18.480         NA <NA>   0
#> 2  20162.03 2003-10-21 12:19:00       1 -38.517 -18.253 1.90694444  gap   1
#> 3  20162.03 2003-10-21 13:33:00       1 -38.464 -18.302 0.05138889   no   1
#> 4  20162.03 2003-10-21 16:38:00       A -38.461 -18.425 0.12847222   no   1
#> 5  20162.03 2003-10-21 18:50:00       A -38.322 -18.512 0.09166667   no   1
#> 6  20162.03 2003-10-23 10:33:00       B -38.674 -19.824 1.65486111  gap   2
#> 7  20162.03 2003-10-23 17:52:00       B -38.957 -19.511 0.30486111   no   2
#> 8  20162.03 2003-11-02 08:14:00       B -42.084 -24.071 9.59861111  gap   3
#> 9  20162.03 2003-11-02 09:36:00       A -41.999 -24.114 0.05694444   no   3
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460         NA <NA>   0
#> 11 20687.03 2003-10-27 19:44:00       2 -39.306 -18.454 0.11250000   no   0
#> 12 20687.03 2003-10-27 21:05:00       1 -39.301 -18.458 0.05625000   no   0

And then split it:

split(df, list(df$id, df$seq), drop = TRUE)
#> $`20162.03.0`
#>         id                date      lc    lon    lat gap_days  gap seq
#> 1 20162.03 2003-10-19 14:33:00 Tagging -39.37 -18.48       NA <NA>   0
#> 
#> $`20687.03.0`
#>          id                date      lc     lon     lat gap_days  gap seq
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460       NA <NA>   0
#> 11 20687.03 2003-10-27 19:44:00       2 -39.306 -18.454  0.11250   no   0
#> 12 20687.03 2003-10-27 21:05:00       1 -39.301 -18.458  0.05625   no   0
#> 
#> $`20162.03.1`
#>         id                date lc     lon     lat   gap_days gap seq
#> 2 20162.03 2003-10-21 12:19:00  1 -38.517 -18.253 1.90694444 gap   1
#> 3 20162.03 2003-10-21 13:33:00  1 -38.464 -18.302 0.05138889  no   1
#> 4 20162.03 2003-10-21 16:38:00  A -38.461 -18.425 0.12847222  no   1
#> 5 20162.03 2003-10-21 18:50:00  A -38.322 -18.512 0.09166667  no   1
#> 
#> $`20162.03.2`
#>         id                date lc     lon     lat  gap_days gap seq
#> 6 20162.03 2003-10-23 10:33:00  B -38.674 -19.824 1.6548611 gap   2
#> 7 20162.03 2003-10-23 17:52:00  B -38.957 -19.511 0.3048611  no   2
#> 
#> $`20162.03.3`
#>         id                date lc     lon     lat   gap_days gap seq
#> 8 20162.03 2003-11-02 08:14:00  B -42.084 -24.071 9.59861111 gap   3
#> 9 20162.03 2003-11-02 09:36:00  A -41.999 -24.114 0.05694444  no   3

How to add sequential counter column on groups using Pandas groupby?

Try groupby with transform:

x = df.groupby('c1')['c2']
df['Ct_X'] = x.transform(lambda x: x.eq('X').cumsum())
df['Ct_Y'] = x.transform(lambda x: x.eq('Y').cumsum())
print(df)

Output:

   c1 c2  seq  Ct_X  Ct_Y
0   A  X    1     1     0
1   A  X    2     2     0
2   A  Y    1     2     1
3   A  Y    2     2     2
4   B  X    1     1     0
5   B  X    2     2     0
6   B  X    3     3     0
7   B  Y    1     3     1
8   C  X    1     1     0
9   C  Y    1     1     1
10  C  Y    2     1     2
11  C  Y    6     1     3

r - Create a sequence number for each row within a category (defined by 2+ fields) in a dataframe

Same idea as @Procrastinatus Maximus's rleid, here is a dplyr version of it:

library(dplyr)
df %>% 
      arrange(personid, date) %>% 
      group_by(personid) %>% 
      mutate(id = cumsum(date != lag(date, default = first(date))) + 1)
      # +1 converts the zero based id to one based id here

# Source: local data frame [6 x 4]
# Groups: personid [3]
#
#   personid   date measurement    id
#      <int> <fctr>       <int> <dbl>
# 1        1      x          23     1
# 2        1      x          32     1
# 3        2      y          21     1
# 4        3      x          23     1
# 5        3      y          23     2
# 6        3      z          23     3

In order for rleid or cumsum to work here, we have to sort the data frame by personid and then date since both methods only care about adjacent values.

Pandas row number for group results

You can use pd.factorize per student group:

df['enrollment'] = df.groupby('student')['year'] \
                     .transform(lambda x: pd.factorize(x)[0] + 1)
print(df)

# Output:
  student   year  enrollment
0       A  20211           1
1       A  20222           2
2       A  20222           2
3       A  20225           3
4       B  20211           1
5       B  20211           1
6       B  20227           2
7       C  20211           1
8       C  20222           2
9       C  20229           3

How to add sequential counter column on groups using Pandas groupby

use cumcount(), see docs here

In [4]: df.groupby(['c1', 'c2']).cumcount()
Out[4]: 
0     0
1     1
2     0
3     1
4     0
5     1
6     2
7     0
8     0
9     0
10    1
11    2
dtype: int64

If you want orderings starting at 1

In [5]: df.groupby(['c1', 'c2']).cumcount()+1
Out[5]: 
0     1
1     2
2     1
3     2
4     1
5     2
6     3
7     1
8     1
9     1
10    2
11    3
dtype: int64