How to number/label data-table by group-number from group_by?
Updated answer
get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())
You can also consider the following slightly unreadable version
group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())
using iterators
package
library(iterators)
counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))
How to give numbers to each group of a dataframe with dplyr::group_by?
Use mutate
to add a column which is just a numeric form of from
as a factor:
df %>% mutate(group_no = as.integer(factor(from)))
# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2
Note group_by
isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by
instead of mutate
to add the column.
How to label a double grouped data frame with a group number with group_indices in dplyr?
We can use dense_rank
.
library(dplyr)
db2 <- db %>%
group_by(ID) %>%
mutate(rank = dense_rank(date)) %>%
ungroup()
db2
# # A tibble: 10 x 3
# ID date rank
# <dbl> <date> <int>
# 1 1. 2001-01-01 1
# 2 1. 2001-01-01 1
# 3 1. 2001-01-01 1
# 4 1. 2001-01-03 2
# 5 1. 2001-01-03 2
# 6 2. 2011-01-01 3
# 7 2. 2011-01-01 3
# 8 2. 2010-03-12 2
# 9 2. 2010-03-12 2
# 10 2. 2001-01-01 1
Numbering of groups in dplyr?
Using as.numeric will do the trick.
S <- rep(letters[1:12],each=6)
R = sort(replicate(9, sample(5000:6000,4)))
df <- data.frame(R,S)
result <- df %>% mutate(label = as.numeric(S)) %>% group_by(S)
result
Source: local data frame [72 x 3]
Groups: S
R S label
1 5018 a 1
2 5042 a 1
3 5055 a 1
4 5066 a 1
5 5081 a 1
6 5133 a 1
7 5149 b 2
8 5191 b 2
9 5197 b 2
10 5248 b 2
.. ... . ...
R: add a dplyr group label as a number
I think in this case something as simple as :
df %>%
mutate(group_no = as.integer(name))
will work
# A tibble: 20 x 4
# Groups: id [2]
id name val group_no
<fct> <fct> <dbl> <int>
1 a N1 0.647 1
2 a N1 0.530 1
3 a N1 0.245 1
4 a N2 0.693 2
5 a N2 0.478 2
6 a N2 0.861 2
7 a N3 0.821 3
8 a N3 0.0995 3
9 a N3 0.662 3
10 b N1 0.553 1
11 b N1 0.0233 1
12 b N1 0.519 1
13 b N2 0.783 2
14 b N2 0.789 2
15 b N2 0.477 2
16 b N2 0.438 2
17 b N2 0.407 2
18 b N3 0.732 3
19 b N3 0.0707 3
20 b N3 0.316 3
Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
Enumerate groups within groups in a data.table
Try
library(data.table)
dt[, id := rleid(cl), by=gr]
dt
# gr cl id
# 1: a a 1
# 2: a a 1
# 3: a a 1
# 4: a b 2
# 5: a b 2
# 6: a b 2
# 7: b c 1
# 8: b c 1
# 9: b c 1
#10: b d 2
#11: b d 2
#12: b d 2
R grouping data by numeric numbers in a column
We can use tidvyerse
to do a group by operation. Create a group of ranges with cut
, summarise
the frequency count based on the cut
and the 'variants', then paste
them together in summarise
library(dplyr)
patient1 %>%
group_by(group = cut(position, breaks = c(-Inf, seq(1, 100,
by = 10))), variants) %>%
summarise(n = n()) %>%
summarise(tally = paste(n, variants, collapse=' ', sep=""))
NOTE: Another option is findInterval
which does similar option as cut
but without the labels
as it will output numeric index
R - Group by variable and then assign a unique ID
dplyr
has a group_indices
function for creating unique group IDs
library(dplyr)
data <- data.frame(personal_id = c("111-111-111", "999-999-999", "222-222-222", "111-111-111"),
gender = c("M", "F", "M", "M"),
temperature = c(99.6, 98.2, 97.8, 95.5))
data$group_id <- data %>% group_indices(personal_id)
data <- data %>% select(-personal_id)
data
gender temperature group_id
1 M 99.6 1
2 F 98.2 3
3 M 97.8 2
4 M 95.5 1
Or within the same pipeline (https://github.com/tidyverse/dplyr/issues/2160):
data %>%
mutate(group_id = group_indices(., personal_id))
Related Topics
What Is the Purpose of Setting a Key in Data.Table
Calculating Cumulative Sum For Each Row
Workflow For Statistical Analysis and Report Writing
Unlist Data Frame Column Preserving Information from Other Column
Yaml Current Date in Rmarkdown
How to Divide Each Row of a Matrix by Elements of a Vector in R
R - Concatenate Two Dataframes
Aggregate a Dataframe on a Given Column and Display Another Column
How to Subtract Months from a Date in R
Select First and Last Row from Grouped Data
Clang-7: Error: Linker Command Failed With Exit Code 1 For Macos Big Sur
Tools For Making Latex Tables in R
Efficient Way to Rbind Data.Frames With Different Columns
Read All Files in a Folder and Apply a Function to Each Data Frame
Custom Legend For Multiple Layer Ggplot