Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
Create a sequential number within each group
A simple solution with Base R:
df$seq <- ave(sapply(df$gap, identical, "gap"), df$id, FUN = cumsum)
df
#> id date lc lon lat gap_days gap seq
#> 1 20162.03 2003-10-19 14:33:00 Tagging -39.370 -18.480 NA <NA> 0
#> 2 20162.03 2003-10-21 12:19:00 1 -38.517 -18.253 1.90694444 gap 1
#> 3 20162.03 2003-10-21 13:33:00 1 -38.464 -18.302 0.05138889 no 1
#> 4 20162.03 2003-10-21 16:38:00 A -38.461 -18.425 0.12847222 no 1
#> 5 20162.03 2003-10-21 18:50:00 A -38.322 -18.512 0.09166667 no 1
#> 6 20162.03 2003-10-23 10:33:00 B -38.674 -19.824 1.65486111 gap 2
#> 7 20162.03 2003-10-23 17:52:00 B -38.957 -19.511 0.30486111 no 2
#> 8 20162.03 2003-11-02 08:14:00 B -42.084 -24.071 9.59861111 gap 3
#> 9 20162.03 2003-11-02 09:36:00 A -41.999 -24.114 0.05694444 no 3
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460 NA <NA> 0
#> 11 20687.03 2003-10-27 19:44:00 2 -39.306 -18.454 0.11250000 no 0
#> 12 20687.03 2003-10-27 21:05:00 1 -39.301 -18.458 0.05625000 no 0
And then split it:
split(df, list(df$id, df$seq), drop = TRUE)
#> $`20162.03.0`
#> id date lc lon lat gap_days gap seq
#> 1 20162.03 2003-10-19 14:33:00 Tagging -39.37 -18.48 NA <NA> 0
#>
#> $`20687.03.0`
#> id date lc lon lat gap_days gap seq
#> 10 20687.03 2003-10-27 17:02:00 Tagging -39.320 -18.460 NA <NA> 0
#> 11 20687.03 2003-10-27 19:44:00 2 -39.306 -18.454 0.11250 no 0
#> 12 20687.03 2003-10-27 21:05:00 1 -39.301 -18.458 0.05625 no 0
#>
#> $`20162.03.1`
#> id date lc lon lat gap_days gap seq
#> 2 20162.03 2003-10-21 12:19:00 1 -38.517 -18.253 1.90694444 gap 1
#> 3 20162.03 2003-10-21 13:33:00 1 -38.464 -18.302 0.05138889 no 1
#> 4 20162.03 2003-10-21 16:38:00 A -38.461 -18.425 0.12847222 no 1
#> 5 20162.03 2003-10-21 18:50:00 A -38.322 -18.512 0.09166667 no 1
#>
#> $`20162.03.2`
#> id date lc lon lat gap_days gap seq
#> 6 20162.03 2003-10-23 10:33:00 B -38.674 -19.824 1.6548611 gap 2
#> 7 20162.03 2003-10-23 17:52:00 B -38.957 -19.511 0.3048611 no 2
#>
#> $`20162.03.3`
#> id date lc lon lat gap_days gap seq
#> 8 20162.03 2003-11-02 08:14:00 B -42.084 -24.071 9.59861111 gap 3
#> 9 20162.03 2003-11-02 09:36:00 A -41.999 -24.114 0.05694444 no 3
How to add sequential counter column on groups using Pandas groupby?
Try groupby
with transform
:
x = df.groupby('c1')['c2']
df['Ct_X'] = x.transform(lambda x: x.eq('X').cumsum())
df['Ct_Y'] = x.transform(lambda x: x.eq('Y').cumsum())
print(df)
Output:
c1 c2 seq Ct_X Ct_Y
0 A X 1 1 0
1 A X 2 2 0
2 A Y 1 2 1
3 A Y 2 2 2
4 B X 1 1 0
5 B X 2 2 0
6 B X 3 3 0
7 B Y 1 3 1
8 C X 1 1 0
9 C Y 1 1 1
10 C Y 2 1 2
11 C Y 6 1 3
r - Create a sequence number for each row within a category (defined by 2+ fields) in a dataframe
Same idea as @Procrastinatus Maximus's rleid
, here is a dplyr
version of it:
library(dplyr)
df %>%
arrange(personid, date) %>%
group_by(personid) %>%
mutate(id = cumsum(date != lag(date, default = first(date))) + 1)
# +1 converts the zero based id to one based id here
# Source: local data frame [6 x 4]
# Groups: personid [3]
#
# personid date measurement id
# <int> <fctr> <int> <dbl>
# 1 1 x 23 1
# 2 1 x 32 1
# 3 2 y 21 1
# 4 3 x 23 1
# 5 3 y 23 2
# 6 3 z 23 3
In order for rleid
or cumsum
to work here, we have to sort the data frame by personid
and then date
since both methods only care about adjacent values.
Pandas row number for group results
You can use pd.factorize
per student group:
df['enrollment'] = df.groupby('student')['year'] \
.transform(lambda x: pd.factorize(x)[0] + 1)
print(df)
# Output:
student year enrollment
0 A 20211 1
1 A 20222 2
2 A 20222 2
3 A 20225 3
4 B 20211 1
5 B 20211 1
6 B 20227 2
7 C 20211 1
8 C 20222 2
9 C 20229 3
How to add sequential counter column on groups using Pandas groupby
use cumcount()
, see docs here
In [4]: df.groupby(['c1', 'c2']).cumcount()
Out[4]:
0 0
1 1
2 0
3 1
4 0
5 1
6 2
7 0
8 0
9 0
10 1
11 2
dtype: int64
If you want orderings starting at 1
In [5]: df.groupby(['c1', 'c2']).cumcount()+1
Out[5]:
0 1
1 2
2 1
3 2
4 1
5 2
6 3
7 1
8 1
9 1
10 2
11 3
dtype: int64
Related Topics
Why Are My Dplyr Group_By & Summarize Not Working Properly? (Name-Collision With Plyr)
How to Set Limits For Axes in Ggplot2 R Plots
Error in ≪My Code≫: Object of Type 'Closure' Is Not Subsettable
Annotating Text on Individual Facet in Ggplot2
Why Does Data.Table Update Names(Dt) by Reference, Even If I Assign to Another Variable
Filter Rows Which Contain a Certain String
Filtering a Data Frame by Values in a Column
Apply a Function to Every Specified Column in a Data.Table and Update by Reference
Transform Year/Week to Date Object
Difference Between '%In%' and '=='
How Can Two Strings Be Concatenated
Split a Large Dataframe into a List of Data Frames Based on Common Value in Column
Predict() - Maybe I'M Not Understanding It
How to Plot With 2 Different Y-Axes
Concatenate String Columns and Order in Alphabetical Order
Rstudio Suddenly Stopped Showing Plots in the Plot Pane
Collapse/Concatenate/Aggregate a Column to a Single Comma Separated String Within Each Group