R Create Id Within a Group

R create ID within a group

There are several ways.

In base R, use ave:

with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
# [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1

With the "data.table" package, use sequence(.N):

library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]

With the "dplyr" package, try:

df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))

or (as recommended by Hadley in the comments):

df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))

Update

Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID) in my "splitstackshape" package. It is based on the "data.table" approach above.

library(splitstackshape)
getanID(df, id.vars = "IDFAM")
# IDFAM AGED .id
# 1: 2010 7599 2996 1 45 1
# 2: 2010 7599 3071 1 47 1
# 3: 2010 7599 3071 1 24 2
# 4: 2010 7599 3660 1 46 1
# 5: 2010 7599 4736 1 46 1
# 6: 2010 7599 6235 1 44 1
# 7: 2010 7599 6299 1 43 1
# 8: 2010 7599 9903 1 43 1
# 9: 2010 7599 11013 1 43 1
# 10: 2010 7599 11778 1 16 1
# 11: 2010 7599 11778 1 43 2
# 12: 2010 7599 12248 1 46 1
# 13: 2010 7599 13127 1 44 1
# 14: 2010 7599 14261 1 47 1
# 15: 2010 7599 16280 1 43 1
# 16: 2010 7599 16280 1 16 2
# 17: 2010 7599 16280 1 20 3
# 18: 2010 7599 16280 1 18 4
# 19: 2010 7599 16280 1 18 5
# 20: 2010 7599 17382 1 43 1

generate id within group

You can use data.table::rleid(), i.e.

library(dplyr)

df %>%
group_by(VarA) %>%
mutate(id = data.table::rleid(VarB))

# A tibble: 6 x 3
# Groups: VarA [2]
# VarA VarB id
# <chr> <chr> <int>
#1 A aaaa 1
#2 A aaaa 1
#3 B bbbb 1
#4 B bbbb 1
#5 B bbbb 1
#6 B cccc 2

R - Group by variable and then assign a unique ID

dplyr has a group_indices function for creating unique group IDs

library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))

data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)

data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1

Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):

data %>% 
mutate(group_id = group_indices(., personal_id))

Create ID within a group by two factors

We can use match on 'players' after grouping by 'game'

library(dplyr)
gamematrix %>%
group_by(game) %>%
arrange(game) %>%
mutate(player_id = match(players, unique(players)))

-output

# A tibble: 12 x 3
# Groups: game [2]
# players game player_id
# <chr> <chr> <int>
# 1 bc 1 1
# 2 bc 1 1
# 3 bc 1 1
# 4 ab 1 2
# 5 ab 1 2
# 6 ab 1 2
# 7 cd 2 1
# 8 cd 2 1
# 9 cd 2 1
#10 bd 2 2
#11 bd 2 2
#12 bd 2 2

Or convert to factor with levels specified as unique values of 'players' after grouping by 'game' and then coerce the factor to integer with as.integer

gamematrix %>% 
group_by(game) %>%
mutate(player_id = as.integer(factor(players, levels = unique(players))))

R: Create numbering within each group

A) The dplyr package offers group_indices() for adding unique group indentifiers:

library(dplyr)

df$number <- df %>%
group_indices(ID)
df

# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...

B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter():

df %>% 
group_by(ID) %>%
filter(n() == 3)

# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3

Assign unique id to consecutive rows within a grouping variable in dplyr

We can use gl

library(dplyr)
df <- df %>%
group_by(group) %>%
mutate(id = as.integer(gl(n(), 2, n()))) %>%
ungroup

Unique ID within group

We can use match (the function)

library(data.table)
data[, ID := match(player, unique(player)), match]

Or using factor

data[, ID := as.integer(factor(player, levels = unique(player))), match]
data
# match player team ID
#1: 1 Dave Australia 1
#2: 1 Dave Australia 1
#3: 1 Dennis Australia 2
#4: 1 Dave Australia 1
#5: 2 Jake England 1
#6: 2 Jake England 1
#7: 2 Josh England 2
#8: 2 Jake England 1

Similar option in dplyr would be

library(dplyr)
data %>%
group_by(match) %>%
mutate(ID = match(player, unique(player)))

Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values

Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.

df1 %>%
group_by(Name,Item) %>%
mutate(purch_num = row_number(),
time_since_first = Date - first(Date),
gap = Date - lag(Date, default = as.Date(-Inf)),
avg_gap = time_since_first / (purch_num-1),
new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
ungroup() %>%
mutate(group = cumsum(new_grp_flag))


Related Topics



Leave a reply



Submit