R create ID within a group
There are several ways.
In base R, use ave
:
with(df, ave(rep(1, nrow(df)), IDFAM, FUN = seq_along))
# [1] 1 1 2 1 1 1 1 1 1 1 2 1 1 1 1 2 3 4 5 1
With the "data.table" package, use sequence(.N)
:
library(data.table)
DT <- as.data.table(df)
DT[, ID := sequence(.N), by = IDFAM]
With the "dplyr" package, try:
df %>% group_by(IDFAM) %>% mutate(count = sequence(n()))
or (as recommended by Hadley in the comments):
df %>% group_by(IDFAM) %>% mutate(count = row_number(IDFAM))
Update
Since this seems to be something that is asked for relatively frequently, this feature has been added as a function (getanID
) in my "splitstackshape" package. It is based on the "data.table" approach above.
library(splitstackshape)
getanID(df, id.vars = "IDFAM")
# IDFAM AGED .id
# 1: 2010 7599 2996 1 45 1
# 2: 2010 7599 3071 1 47 1
# 3: 2010 7599 3071 1 24 2
# 4: 2010 7599 3660 1 46 1
# 5: 2010 7599 4736 1 46 1
# 6: 2010 7599 6235 1 44 1
# 7: 2010 7599 6299 1 43 1
# 8: 2010 7599 9903 1 43 1
# 9: 2010 7599 11013 1 43 1
# 10: 2010 7599 11778 1 16 1
# 11: 2010 7599 11778 1 43 2
# 12: 2010 7599 12248 1 46 1
# 13: 2010 7599 13127 1 44 1
# 14: 2010 7599 14261 1 47 1
# 15: 2010 7599 16280 1 43 1
# 16: 2010 7599 16280 1 16 2
# 17: 2010 7599 16280 1 20 3
# 18: 2010 7599 16280 1 18 4
# 19: 2010 7599 16280 1 18 5
# 20: 2010 7599 17382 1 43 1
generate id within group
You can use data.table::rleid()
, i.e.
library(dplyr)
df %>%
group_by(VarA) %>%
mutate(id = data.table::rleid(VarB))
# A tibble: 6 x 3
# Groups: VarA [2]
# VarA VarB id
# <chr> <chr> <int>
#1 A aaaa 1
#2 A aaaa 1
#3 B bbbb 1
#4 B bbbb 1
#5 B bbbb 1
#6 B cccc 2
R - Group by variable and then assign a unique ID
dplyr
has a group_indices
function for creating unique group IDs
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):
data %>%
mutate(group_id = group_indices(., personal_id))
Create ID within a group by two factors
We can use match
on 'players' after grouping by 'game'
library(dplyr)
gamematrix %>%
group_by(game) %>%
arrange(game) %>%
mutate(player_id = match(players, unique(players)))
-output
# A tibble: 12 x 3
# Groups: game [2]
# players game player_id
# <chr> <chr> <int>
# 1 bc 1 1
# 2 bc 1 1
# 3 bc 1 1
# 4 ab 1 2
# 5 ab 1 2
# 6 ab 1 2
# 7 cd 2 1
# 8 cd 2 1
# 9 cd 2 1
#10 bd 2 2
#11 bd 2 2
#12 bd 2 2
Or convert to factor
with levels
specified as unique
values of 'players' after grouping by 'game' and then coerce the factor
to integer
with as.integer
gamematrix %>%
group_by(game) %>%
mutate(player_id = as.integer(factor(players, levels = unique(players))))
R: Create numbering within each group
A) The dplyr
package offers group_indices()
for adding unique group indentifiers:
library(dplyr)
df$number <- df %>%
group_indices(ID)
df
# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...
B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter()
:
df %>%
group_by(ID) %>%
filter(n() == 3)
# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3
Assign unique id to consecutive rows within a grouping variable in dplyr
We can use gl
library(dplyr)
df <- df %>%
group_by(group) %>%
mutate(id = as.integer(gl(n(), 2, n()))) %>%
ungroup
Unique ID within group
We can use match
(the function)
library(data.table)
data[, ID := match(player, unique(player)), match]
Or using factor
data[, ID := as.integer(factor(player, levels = unique(player))), match]
data
# match player team ID
#1: 1 Dave Australia 1
#2: 1 Dave Australia 1
#3: 1 Dennis Australia 2
#4: 1 Dave Australia 1
#5: 2 Jake England 1
#6: 2 Jake England 1
#7: 2 Josh England 2
#8: 2 Jake England 1
Similar option in dplyr
would be
library(dplyr)
data %>%
group_by(match) %>%
mutate(ID = match(player, unique(player)))
Group dataframe rows by creating a unique ID column based on the amount of time passed between entries and variable values
Here's a dplyr approach that calculates the gap and rolling avg gap within each Name/Item group, then flags large gaps, and assigns a new group for each large gap or change in Name or Item.
df1 %>%
group_by(Name,Item) %>%
mutate(purch_num = row_number(),
time_since_first = Date - first(Date),
gap = Date - lag(Date, default = as.Date(-Inf)),
avg_gap = time_since_first / (purch_num-1),
new_grp_flag = gap > 180 | gap > 3*avg_gap) %>%
ungroup() %>%
mutate(group = cumsum(new_grp_flag))
Related Topics
Directly Creating Dummy Variable Set in a Sparse Matrix in R
Select Columns Based on String Match - Dplyr::Select
Mean of a Column in a Data Frame, Given the Column's Name
Joining Aggregated Values Back to the Original Data Frame
Conditional Coloring of Cells in Table
Making a Stacked Area Plot Using Ggplot2
Why Does "One" < 2 Equal False in R
Differencebetween Gc() and Rm()
Return a Data Frame from Function
R - How to Get Row & Column Subscripts of Matched Elements from a Distance Matrix
Techniques for Finding Near Duplicate Records
How to Extract the Row with Min or Max Values
Can't Download Data from Yahoo Finance Using Quantmod in R
R - Converting Date and Time Fields to Posixct with Hhmmss Format
How to Remove Columns from a Data.Frame
Getting the Last N Elements of a Vector. Is There a Better Way Than Using the Length() Function