R Create Id Within a Group

R create ID within a group

There are several ways.

In base R, use ave:

with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
#  [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1

With the "data.table" package, use sequence(.N):

library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]

With the "dplyr" package, try:

df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))

or (as recommended by Hadley in the comments):

df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))

Update

Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID) in my "splitstackshape" package. It is based on the "data.table" approach above.

library(splitstackshape)
getanID(df, id.vars = "IDFAM")
#                 IDFAM AGED .id
#  1:  2010 7599 2996 1   45   1
#  2:  2010 7599 3071 1   47   1
#  3:  2010 7599 3071 1   24   2
#  4:  2010 7599 3660 1   46   1
#  5:  2010 7599 4736 1   46   1
#  6:  2010 7599 6235 1   44   1
#  7:  2010 7599 6299 1   43   1
#  8:  2010 7599 9903 1   43   1
#  9: 2010 7599 11013 1   43   1
# 10: 2010 7599 11778 1   16   1
# 11: 2010 7599 11778 1   43   2
# 12: 2010 7599 12248 1   46   1
# 13: 2010 7599 13127 1   44   1
# 14: 2010 7599 14261 1   47   1
# 15: 2010 7599 16280 1   43   1
# 16: 2010 7599 16280 1   16   2
# 17: 2010 7599 16280 1   20   3
# 18: 2010 7599 16280 1   18   4
# 19: 2010 7599 16280 1   18   5
# 20: 2010 7599 17382 1   43   1

generate id within group

You can use data.table::rleid(), i.e.

library(dplyr)

df %>% 
 group_by(VarA) %>% 
 mutate(id = data.table::rleid(VarB))

# A tibble: 6 x 3
# Groups:   VarA [2]
#  VarA  VarB     id
#  <chr> <chr> <int>
#1 A     aaaa      1
#2 A     aaaa      1
#3 B     bbbb      1
#4 B     bbbb      1
#5 B     bbbb      1
#6 B     cccc      2

R - Group by variable and then assign a unique ID

dplyr has a group_indices function for creating unique group IDs

library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
                       gender = c("M", "F", "M", "M"),
                       temperature = c(99.6, 98.2, 97.8, 95.5))

data$group_id <- data %>% group_indices(personal_id) 
data <- data %>% select(-personal_id)

data
  gender temperature group_id
1      M        99.6        1
2      F        98.2        3
3      M        97.8        2
4      M        95.5        1

Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):

data %>% 
    mutate(group_id = group_indices(., personal_id))

Create ID within a group by two factors

We can use match on 'players' after grouping by 'game'

library(dplyr)
gamematrix %>% 
       group_by(game) %>%
       arrange(game) %>% 
       mutate(player_id = match(players, unique(players)))

-output

# A tibble: 12 x 3
# Groups:   game [2]
#   players game  player_id
#   <chr>   <chr>     <int>
# 1 bc      1             1
# 2 bc      1             1
# 3 bc      1             1
# 4 ab      1             2
# 5 ab      1             2
# 6 ab      1             2
# 7 cd      2             1
# 8 cd      2             1
# 9 cd      2             1
#10 bd      2             2
#11 bd      2             2
#12 bd      2             2

Or convert to factor with levels specified as unique values of 'players' after grouping by 'game' and then coerce the factor to integer with as.integer

gamematrix %>% 
   group_by(game) %>%
   mutate(player_id = as.integer(factor(players, levels = unique(players))))

R: Create numbering within each group

A) The dplyr package offers group_indices() for adding unique group indentifiers:

library(dplyr)

df$number <- df %>% 
  group_indices(ID)
df

# A tibble: 10 × 3
   study    ID number
   <chr> <dbl>  <int>
 1 A         1      1
 2 B         1      1
 3 C         1      1
 4 A         5      2
 5 B         5      2
...

B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter():

df %>% 
  group_by(ID) %>% 
  filter(n() == 3)

# A tibble: 6 × 3
# Groups:   ID [2]
  study    ID number
  <chr> <dbl>  <int>
1 A         1      1
2 B         1      1
3 C         1      1
4 A         7      3
5 B         7      3
6 C         7      3

Assign unique id to consecutive rows within a grouping variable in dplyr

We can use gl

library(dplyr)
df <- df %>%
    group_by(group) %>% 
    mutate(id = as.integer(gl(n(), 2, n()))) %>%
    ungroup

Unique ID within group

We can use match (the function)

library(data.table)
data[, ID := match(player, unique(player)), match]

Or using factor

data[, ID := as.integer(factor(player, levels = unique(player))), match]
data
#   match player      team ID
#1:     1   Dave Australia  1
#2:     1   Dave Australia  1
#3:     1 Dennis Australia  2
#4:     1   Dave Australia  1
#5:     2   Jake   England  1
#6:     2   Jake   England  1
#7:     2   Josh   England  2
#8:     2   Jake   England  1

Similar option in dplyr would be

library(dplyr)
data %>%
   group_by(match) %>%
   mutate(ID = match(player, unique(player)))

Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values

Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.

df1 %>%
  group_by(Name,Item) %>%
  mutate(purch_num = row_number(),
         time_since_first = Date - first(Date),
         gap = Date - lag(Date, default = as.Date(-Inf)),
         avg_gap = time_since_first / (purch_num-1),
         new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
  ungroup() %>%
  mutate(group = cumsum(new_grp_flag))

R Create Id Within a Group