How to Create Group Indices for Nested Groups in R

How to create group indices for nested groups in r

dplyr

Using cumsum and !duplicated with dplyr

df %>%
group_by(id) %>%
mutate(daynum = cumsum(!duplicated(dayweek)))

# A tibble: 13 x 3
# Groups: id [2]
id dayweek daynum
<dbl> <dbl> <int>
1 1 1 1
2 1 1 1
3 1 4 2
4 1 4 2
5 1 5 3
6 1 5 3
7 2 1 1
8 2 1 1
9 2 2 2
10 2 2 2
11 2 3 3
12 2 3 3
13 2 3 3

tapply from base R

unlist(tapply(df$dayweek, df$id, function(x) cumsum(!duplicated(x))))

1 1 2 2 3 3 1 1 2 2 3 3 3

How to use dplyr group_by and get indices each distinct grouping?

You can do:

df %>%
group_by(user) %>%
mutate(indices = cumsum(!duplicated(purchase)))

user purchase indices
<chr> <chr> <int>
1 Peter Snickers 1
2 Peter Snickers 1
3 Peter Coke 2
4 Paul Pepsi 1
5 Paul Pepsi 1
6 Mary Snickers 1
7 Mary Pepsi 2
8 Mary Coke 3

R: how to group by nested intervals?

Tweaked your example a bit, still I think this will suffice largely. (Also see notes below the code)

df <- read.table(text = "     d   b c a
1 3400 100 3 1
2 3400 100 3 1
3 3400 100 1 1
4 3408 100 1 1
5 3412 100 3 1
6 3434 100 3 1
7 3436 100 1 1
8 3438 100 3 1
9 3445 100 1 1
10 3443 100 3 1
11 3444 100 1 1
12 3463 100 3 1
13 3463 100 1 1
14 3463 100 3 1
15 3465 100 3 1", header = T)

#added one row in df
> df
d b c a
1 3400 100 3 1
2 3400 100 3 1
3 3400 100 1 1
4 3408 100 1 1
5 3412 100 3 1
6 3434 100 3 1
7 3436 100 1 1
8 3438 100 3 1
9 3445 100 1 1
10 3443 100 3 1
11 3444 100 1 1
12 3463 100 3 1
13 3463 100 1 1
14 3463 100 3 1
15 3465 100 3 1

Now follow this strategy

library(tidyverse)
library(data.table) # for rleid()

df %>% mutate(r = row_number()) %>%
group_by(b, a) %>% mutate(grp_no = rleid(accumulate(d, ~ifelse(.y - .x > 30, .y, .x)))) %>%
group_by(b, a, grp_no) %>%
summarise(row_count = n(), r = first(r), d = first(d)) %>%
arrange(r) %>%
mutate(additional = paste("group starts at d =", d)) %>%
select(-r, -d)

# A tibble: 3 x 5
# Groups: b, a [1]
b a grp_no row_count additional
<int> <int> <int> <int> <chr>
1 100 1 1 5 group starts at d = 3400
2 100 1 2 9 group starts at d = 3434
3 100 1 3 1 group starts at d = 3465

With first example, its output is

# A tibble: 5 x 5
# Groups: b, a [3]
b a grp_no row_count additional
<int> <int> <int> <int> <chr>
1 100 -1 1 2 group starts at d = 3400
2 50 1 1 3 group starts at d = 3400
3 100 1 1 3 group starts at d = 3412
4 50 1 2 2 group starts at d = 3438
5 100 1 2 2 group starts at d = 3454

Note: you may also use dplyr::dense_rank instead of rleid in above syntax, like this

df %>% mutate(r = row_number()) %>%
group_by(b, a) %>%
mutate(grp_no = dense_rank(accumulate(d, ~ifelse(.y - .x > 30, .y, .x)) )) %>%
group_by(b, a, grp_no) %>%
summarise(row_count = n(), r = first(r), d = first(d)) %>%
arrange(r) %>%
mutate(additional = paste("group starts at d =", d)) %>%
select(-r, -d)

EndNote: Now I am not how your logic of c==3 fits into this? If you'll clarify I may try again

Counting rows in nested groups

You may use ave and create unique number for GROUP within each SITE and SAMPLE.

df$SEQ_SAMPLE = with(df, as.integer(ave(GROUP, SITE, SAMPLE, 
FUN = function(x) with(rle(x), rep(seq_along(values), lengths)))))

identical(df$SEQ_SAMPLE, result$SEQ_SAMPLE)
#[1] TRUE

Enumerate groups within groups in a data.table

Try

library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2

Nested groupings with data.table

Here's an easy way:

setkey(dt, id, membr)
ans <- dt[, .SD[CJ(unique(id), unique(membr))], by=list(event)]

Then, you can just replace the NA with 0's as follows:

ans[is.na(freqrel), freqrel := 0.0]

Some explanation: Your problem boils down to this - for every event, you want all possible combinations of id, membr so that you can then perform a join on this all-combination within that grouping using .SD.

So, first we group by event, and within that, we first get all combinations of id, membr with the help of CJ (which will have a key set to all columns by default). However, to perform a join we need to have the key set for .SD. Therefore, we set the key for dt to id, membr upfront. Thus, we perform a join within each group and that gives you the intended result. Hope this helps a bit.

Is there a way to index numbers for grouped items according to their order in that group?

We can do a group by 'GameNo' and create the 'Placement' as the rank of 'PlayerScore'`

library(dplyr) 
tib <- tib %>%
group_by(GameNo) %>%
mutate(Placement = rank(-PlayerScore)) %>%
ungroup

-output

tib
# A tibble: 6 x 4
PlayerName GameNo PlayerScore Placement
<chr> <dbl> <dbl> <dbl>
1 P1 1 10 2
2 P2 1 15 1
3 P3 1 9 3
4 P1 2 8 3
5 P2 2 12 2
6 P3 2 18 1

Summation by group based on unique values for nested data in r

May be we need to wrap with unique

a[, sum_num := sum(unique(number_v)), .(group_v)]

But, that may fail to sum the same value across different type_v1. So, instead of that, we may create a logical index with duplicated on 'type_v1'

a[, sum_num := sum(number_v[!duplicated(type_v1)]), .(group_v)]

-output

#    group_v type_v1 type_v2 number_v sum_num
#1: A 1 1 12 73
#2: A 2 2a 26 73
#3: A 2 2b 26 73
#4: A 3 3 35 73
#5: B 4 4a 24 31
#6: B 4 4b 24 31
#7: B 4 4c 24 31
#8: B 5 5 7 31


Related Topics



Leave a reply



Submit