Numbering by Groups

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

numbering by groups

Using dplyr

dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10))

library(dplyr)
dat %>% group_by(ID) %>%
mutate(number.in.group = 1:n())

R: Create numbering within each group

A) The dplyr package offers group_indices() for adding unique group indentifiers:

library(dplyr)

df$number <- df %>%
group_indices(ID)
df

# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...

B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter():

df %>% 
group_by(ID) %>%
filter(n() == 3)

# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3

Sequentially Number Group within Group

One way would be to join your original data to a summarized version of the same data where you assign group id's for the variable in question. Here's an example on a standard data set:

left_join(mtcars, mtcars %>% group_by(gear) %>% summarize(id = cur_group_id()))

Or, using a version of data like yours:

dft <- data.frame(
stringsAsFactors = FALSE,
pmid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
item = c("Age", "Age", "BMI", "BMI", "Age", "Age", "BMI", "Duration")
)

left_join(
dft,
dft %>%
group_by(item) %>%
summarize(id = cur_group_id()))

Result

Joining, by = "item"
pmid item id
1 1 Age 1
2 1 Age 1
3 1 BMI 2
4 1 BMI 2
5 2 Age 1
6 2 Age 1
7 2 BMI 2
8 2 Duration 3

Please note, for future questions, it will make it easier for others to help you if you can provide some example data as code we can load, as opposed to a printout of what the data looks like. It's often easiest to use the magical dput function, by running dput(dft) or dput(head(dft, 50)) and pasting the output into the body of your question.

Numbering of groups in dplyr?

Using as.numeric will do the trick.

S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)

result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)

result
Source: local data frame [72 x 3]
Groups: S

R S label
1 5018 a 1
2 5042 a 1
3 5055 a 1
4 5066 a 1
5 5081 a 1
6 5133 a 1
7 5149 b 2
8 5191 b 2
9 5197 b 2
10 5248 b 2
.. ... . ...

Numbering rows within groups with data gaps in R

You can add another value in group_by which would change the value when the difference of current date and previous date is greater than 1.

library(dplyr)

data %>%
mutate(day = as.Date(day)) %>%
group_by(group1, group2,
date_gap = cumsum(day - lag(day, default = first(day)) > 1)) %>%
mutate(id_day = row_number()) %>%
ungroup %>%
select(-date_gap)

# day group1 group2 id_day
# <date> <chr> <chr> <int>
# 1 2020-05-01 A B 1
# 2 2020-05-02 A B 2
# 3 2020-05-03 A B 3
# 4 2020-05-04 A B 4
# 5 2020-05-07 A B 1
# 6 2020-05-08 A B 2
# 7 2020-05-09 A B 3
# 8 2020-06-05 C D 1
# 9 2020-06-06 C D 2
#10 2020-06-07 C D 3
#11 2020-06-08 C D 4
#12 2020-06-09 C D 5
#13 2020-06-10 C D 6
#14 2020-06-11 C D 7

data

data <- structure(list(day = c("2020-05-01", "2020-05-02", "2020-05-03", 
"2020-05-04", "2020-05-07", "2020-05-08", "2020-05-09", "2020-06-05",
"2020-06-06", "2020-06-07", "2020-06-08", "2020-06-09", "2020-06-10",
"2020-06-11"), group1 = c("A", "A", "A", "A", "A", "A", "A",
"C", "C", "C", "C", "C", "C", "C"), group2 = c("B", "B", "B",
"B", "B", "B", "B", "D", "D", "D", "D", "D", "D", "D"), id_day = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)),
class = "data.frame", row.names = c(NA, -14L))

How to give numbers to each group of a dataframe with dplyr::group_by?

Use mutate to add a column which is just a numeric form of from as a factor:

df %>% mutate(group_no = as.integer(factor(from)))

# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2

Note group_by isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by instead of mutate to add the column.

Python: How can I add sequence numbers to groups?

Assuming you just want to create a sequence number column, you can use ngroup:

df = pd.DataFrame({'group Nr':[50,50,50,53,53,53,53,56,56,59,59,59]})
df["sequence Nr"] = df.groupby("group Nr").ngroup() + 1

ngroup numbers each group starting from 0, so you'll want to add 1.



Related Topics



Leave a reply



Submit