Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
numbering by groups
Using dplyr
dat <- data.frame(ID = rep(1:3, c(2, 3, 5)), val = rnorm(10))
library(dplyr)
dat %>% group_by(ID) %>%
mutate(number.in.group = 1:n())
R: Create numbering within each group
A) The dplyr
package offers group_indices()
for adding unique group indentifiers:
library(dplyr)
df$number <- df %>%
group_indices(ID)
df
# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...
B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter()
:
df %>%
group_by(ID) %>%
filter(n() == 3)
# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3
Sequentially Number Group within Group
One way would be to join your original data to a summarized version of the same data where you assign group id's for the variable in question. Here's an example on a standard data set:
left_join(mtcars, mtcars %>% group_by(gear) %>% summarize(id = cur_group_id()))
Or, using a version of data like yours:
dft <- data.frame(
stringsAsFactors = FALSE,
pmid = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L),
item = c("Age", "Age", "BMI", "BMI", "Age", "Age", "BMI", "Duration")
)
left_join(
dft,
dft %>%
group_by(item) %>%
summarize(id = cur_group_id()))
Result
Joining, by = "item"
pmid item id
1 1 Age 1
2 1 Age 1
3 1 BMI 2
4 1 BMI 2
5 2 Age 1
6 2 Age 1
7 2 BMI 2
8 2 Duration 3
Please note, for future questions, it will make it easier for others to help you if you can provide some example data as code we can load, as opposed to a printout of what the data looks like. It's often easiest to use the magical dput
function, by running dput(dft)
or dput(head(dft, 50))
and pasting the output into the body of your question.
Numbering of groups in dplyr?
Using as.numeric will do the trick.
S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)
result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)
result
Source: local data frame [72 x 3]
Groups: S
R S label
1 5018 a 1
2 5042 a 1
3 5055 a 1
4 5066 a 1
5 5081 a 1
6 5133 a 1
7 5149 b 2
8 5191 b 2
9 5197 b 2
10 5248 b 2
.. ... . ...
Numbering rows within groups with data gaps in R
You can add another value in group_by
which would change the value when the difference of current date and previous date is greater than 1.
library(dplyr)
data %>%
mutate(day = as.Date(day)) %>%
group_by(group1, group2,
date_gap = cumsum(day - lag(day, default = first(day)) > 1)) %>%
mutate(id_day = row_number()) %>%
ungroup %>%
select(-date_gap)
# day group1 group2 id_day
# <date> <chr> <chr> <int>
# 1 2020-05-01 A B 1
# 2 2020-05-02 A B 2
# 3 2020-05-03 A B 3
# 4 2020-05-04 A B 4
# 5 2020-05-07 A B 1
# 6 2020-05-08 A B 2
# 7 2020-05-09 A B 3
# 8 2020-06-05 C D 1
# 9 2020-06-06 C D 2
#10 2020-06-07 C D 3
#11 2020-06-08 C D 4
#12 2020-06-09 C D 5
#13 2020-06-10 C D 6
#14 2020-06-11 C D 7
data
data <- structure(list(day = c("2020-05-01", "2020-05-02", "2020-05-03",
"2020-05-04", "2020-05-07", "2020-05-08", "2020-05-09", "2020-06-05",
"2020-06-06", "2020-06-07", "2020-06-08", "2020-06-09", "2020-06-10",
"2020-06-11"), group1 = c("A", "A", "A", "A", "A", "A", "A",
"C", "C", "C", "C", "C", "C", "C"), group2 = c("B", "B", "B",
"B", "B", "B", "B", "D", "D", "D", "D", "D", "D", "D"), id_day = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L)),
class = "data.frame", row.names = c(NA, -14L))
How to give numbers to each group of a dataframe with dplyr::group_by?
Use mutate
to add a column which is just a numeric form of from
as a factor:
df %>% mutate(group_no = as.integer(factor(from)))
# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2
Note group_by
isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by
instead of mutate
to add the column.
Python: How can I add sequence numbers to groups?
Assuming you just want to create a sequence number column, you can use ngroup:
df = pd.DataFrame({'group Nr':[50,50,50,53,53,53,53,56,56,59,59,59]})
df["sequence Nr"] = df.groupby("group Nr").ngroup() + 1
ngroup numbers each group starting from 0, so you'll want to add 1.
Related Topics
How to Determine If Date Is a Weekend or Not (Not Using Lubridate)
Writing Robust R Code: Namespaces, Masking and Using the '::' Operator
If Else Condition in Ggplot to Add an Extra Layer
Bigrams Instead of Single Words in Termdocument Matrix Using R and Rweka
Add a Box for the Na Values to the Ggplot Legend for a Continuous Map
Looping Through T.Tests for Data Frame Subsets in R
How to Replace Na with Most Recent Non-Na by Group
Two-Column Layouts in Rstudio Presentations/Slidify/Pandoc
R: Replace Multiple Values in Multiple Columns of Dataframes with Na
Why Use As.Factor() Instead of Just Factor()
Controlling Order of Facet_Grid/Facet_Wrap in Ggplot2
Use Ggpairs to Create This Plot
Generate Correlated Random Numbers from Binomial Distributions
R's Read.CSV Prepending 1St Column Name with Junk Text