How to create group indices for nested groups in r
dplyr
Using cumsum
and !duplicated
with dplyr
df %>%
group_by(id) %>%
mutate(daynum = cumsum(!duplicated(dayweek)))
# A tibble: 13 x 3
# Groups: id [2]
id dayweek daynum
<dbl> <dbl> <int>
1 1 1 1
2 1 1 1
3 1 4 2
4 1 4 2
5 1 5 3
6 1 5 3
7 2 1 1
8 2 1 1
9 2 2 2
10 2 2 2
11 2 3 3
12 2 3 3
13 2 3 3
tapply
from base R
unlist(tapply(df$dayweek, df$id, function(x) cumsum(!duplicated(x))))
1 1 2 2 3 3 1 1 2 2 3 3 3
How to use dplyr group_by and get indices each distinct grouping?
You can do:
df %>%
group_by(user) %>%
mutate(indices = cumsum(!duplicated(purchase)))
user purchase indices
<chr> <chr> <int>
1 Peter Snickers 1
2 Peter Snickers 1
3 Peter Coke 2
4 Paul Pepsi 1
5 Paul Pepsi 1
6 Mary Snickers 1
7 Mary Pepsi 2
8 Mary Coke 3
R: how to group by nested intervals?
Tweaked your example a bit, still I think this will suffice largely. (Also see notes below the code)
df <- read.table(text = " d b c a
1 3400 100 3 1
2 3400 100 3 1
3 3400 100 1 1
4 3408 100 1 1
5 3412 100 3 1
6 3434 100 3 1
7 3436 100 1 1
8 3438 100 3 1
9 3445 100 1 1
10 3443 100 3 1
11 3444 100 1 1
12 3463 100 3 1
13 3463 100 1 1
14 3463 100 3 1
15 3465 100 3 1", header = T)
#added one row in df
> df
d b c a
1 3400 100 3 1
2 3400 100 3 1
3 3400 100 1 1
4 3408 100 1 1
5 3412 100 3 1
6 3434 100 3 1
7 3436 100 1 1
8 3438 100 3 1
9 3445 100 1 1
10 3443 100 3 1
11 3444 100 1 1
12 3463 100 3 1
13 3463 100 1 1
14 3463 100 3 1
15 3465 100 3 1
Now follow this strategy
library(tidyverse)
library(data.table) # for rleid()
df %>% mutate(r = row_number()) %>%
group_by(b, a) %>% mutate(grp_no = rleid(accumulate(d, ~ifelse(.y - .x > 30, .y, .x)))) %>%
group_by(b, a, grp_no) %>%
summarise(row_count = n(), r = first(r), d = first(d)) %>%
arrange(r) %>%
mutate(additional = paste("group starts at d =", d)) %>%
select(-r, -d)
# A tibble: 3 x 5
# Groups: b, a [1]
b a grp_no row_count additional
<int> <int> <int> <int> <chr>
1 100 1 1 5 group starts at d = 3400
2 100 1 2 9 group starts at d = 3434
3 100 1 3 1 group starts at d = 3465
With first example, its output is
# A tibble: 5 x 5
# Groups: b, a [3]
b a grp_no row_count additional
<int> <int> <int> <int> <chr>
1 100 -1 1 2 group starts at d = 3400
2 50 1 1 3 group starts at d = 3400
3 100 1 1 3 group starts at d = 3412
4 50 1 2 2 group starts at d = 3438
5 100 1 2 2 group starts at d = 3454
Note: you may also use dplyr::dense_rank
instead of rleid
in above syntax, like this
df %>% mutate(r = row_number()) %>%
group_by(b, a) %>%
mutate(grp_no = dense_rank(accumulate(d, ~ifelse(.y - .x > 30, .y, .x)) )) %>%
group_by(b, a, grp_no) %>%
summarise(row_count = n(), r = first(r), d = first(d)) %>%
arrange(r) %>%
mutate(additional = paste("group starts at d =", d)) %>%
select(-r, -d)
EndNote: Now I am not how your logic of c==3
fits into this? If you'll clarify I may try again
Counting rows in nested groups
You may use ave
and create unique number for GROUP
within each SITE
and SAMPLE
.
df$SEQ_SAMPLE = with(df, as.integer(ave(GROUP, SITE, SAMPLE,
FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))
identical(df$SEQ_SAMPLE, result$SEQ_SAMPLE)
#[1] TRUE
Enumerate groups within groups in a data.table
Try
library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2
Nested groupings with data.table
Here's an easy way:
setkey(dt, id, membr)
ans <- dt[, .SD[CJ(unique(id), unique(membr))], by=list(event)]
Then, you can just replace the NA
with 0's as follows:
ans[is.na(freqrel), freqrel := 0.0]
Some explanation: Your problem boils down to this - for every event
, you want all possible combinations of id, membr
so that you can then perform a join on this all-combination within that grouping using .SD
.
So, first we group by event
, and within that, we first get all combinations of id, membr
with the help of CJ
(which will have a key set to all columns by default). However, to perform a join we need to have the key set for .SD
. Therefore, we set the key
for dt
to id, membr
upfront. Thus, we perform a join within each group and that gives you the intended result. Hope this helps a bit.
Is there a way to index numbers for grouped items according to their order in that group?
We can do a group by 'GameNo' and create the 'Placement' as the rank
of 'PlayerScore'`
library(dplyr)
tib <- tib %>%
group_by(GameNo) %>%
mutate(Placement = rank(-PlayerScore)) %>%
ungroup
-output
tib
# A tibble: 6 x 4
PlayerName GameNo PlayerScore Placement
<chr> <dbl> <dbl> <dbl>
1 P1 1 10 2
2 P2 1 15 1
3 P3 1 9 3
4 P1 2 8 3
5 P2 2 12 2
6 P3 2 18 1
Summation by group based on unique values for nested data in r
May be we need to wrap with unique
a[, sum_num := sum(unique(number_v)), .(group_v)]
But, that may fail to sum the same value across different type_v1. So, instead of that, we may create a logical index with duplicated
on 'type_v1'
a[, sum_num := sum(number_v[!duplicated(type_v1)]), .(group_v)]
-output
# group_v type_v1 type_v2 number_v sum_num
#1: A 1 1 12 73
#2: A 2 2a 26 73
#3: A 2 2b 26 73
#4: A 3 3 35 73
#5: B 4 4a 24 31
#6: B 4 4b 24 31
#7: B 4 4c 24 31
#8: B 5 5 7 31
Related Topics
Group Data in R for Consecutive Rows
Add Titles to Ggplots Created with Map()
R: Faceted Bar Chart with Percentages Labels Independent for Each Plot
R Remove Multiple Text Strings in Data Frame
Send a Text String Containing Double Quotes to Function
Ggplot2 One Line Per Each Row Dataframe
Page Refresh Button in R Shiny
Twitter Emoji Encoding Problems with Twitter and R
Binning Data, Finding Results by Group, and Plotting Using R
R Cumulative Sum with a Condition and a Reset
Drawing a Stratified Sample in R
R: Further Subset a Selection Using the Pipe %>% and Placeholder
Remove Words in One Column Present in Another Column in R
R - Delete Consecutive (Only) Duplicates
How to Neatly Align the Regression Equation and R2 and P Value
Difference Between 'Paste', 'Str_C', 'Str_Join', 'Stri_Join', 'Stri_C', 'Stri_Paste'