How to give numbers to each group of a dataframe with dplyr::group_by?
Use mutate
to add a column which is just a numeric form of from
as a factor:
df %>% mutate(group_no = as.integer(factor(from)))
# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2
Note group_by
isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by
instead of mutate
to add the column.
How to number/label data-table by group-number from group_by?
Updated answer
get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())
You can also consider the following slightly unreadable version
group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())
using iterators
package
library(iterators)
counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))
Using aggregate/group_by in R to group data and give a count for each factor variable?
With dpylr::count
and tidyr::pivot_wider
you could do:
library(dplyr)
library(tidyr)
telangiectasia_tumour_data %>%
count(Telangiectasia_time, grade) %>%
pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#> Telangiectasia_time grade0 grade1
#> <chr> <int> <int>
#> 1 telangiectasia_tumour_0 1 1
#> 2 telangiectasia_tumour_1 1 1
#> 3 telangiectasia_tumour_12 1 0
#> 4 telangiectasia_tumour_24 1 0
DATA
telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
"telangiectasia_tumour_0",
"telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
"telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
0L,
0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6"
))
Adding Id number using group_by() in R
You can do this without group_by
.
df %>%
mutate(ID = match(grouping, unique(grouping)))
#> evaluation grouping ID
#> 1 1.0 a 1
#> 2 0.5 a 1
#> 3 2.0 b 2
#> 4 1.0 b 2
#> 5 2.0 b 2
#> 6 0.5 c 3
R: Create numbering within each group
A) The dplyr
package offers group_indices()
for adding unique group indentifiers:
library(dplyr)
df$number <- df %>%
group_indices(ID)
df
# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...
B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter()
:
df %>%
group_by(ID) %>%
filter(n() == 3)
# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3
Numbering rows within groups in a data frame
Use ave
, ddply
, dplyr
or data.table
:
df$num <- ave(df$val, df$cat, FUN = seq_along)
or:
library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))
or:
library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())
or (the most memory efficient, as it assigns by reference within DT
):
library(data.table)
DT <- data.table(df)
DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]
How to add a row to each group and assign values
According to the documentation of the function group_modify
, if you use a formula, you must use ".
or .x
to refer to the subset of rows of .tbl
for the given group;" that's why you used .x
inside the add_row
function. To be entirely consistent, you have to do it also within the first
function.
df %>%
group_by(id) %>%
group_modify(~ add_row(A=4, B=first(.x$B), .x))
# A tibble: 6 x 3
# Groups: id [3]
id A B
<chr> <dbl> <dbl>
1 one 1 4
2 one 4 4
3 three 3 6
4 three 4 6
5 two 2 5
6 two 4 5
Using first(.$B)
or first(df$B)
will provide the same results.
Expand each group to the max n of rows
You can take advantage of the fact that df[n_bigger_than_nrow,]
gives a row of NA
s
dplyr
max_n <- max(count(df, ID)$n)
df %>%
group_by(ID) %>%
summarise(cur_data()[seq(max_n),])
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 9 × 2
#> # Groups: ID [3]
#> ID col1
#> <int> <chr>
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R
base R
n <- tapply(df$ID, df$ID, length)
max_n <- max(n)
i <- lapply(n, \(x) c(seq(x), rep(Inf, max_n - x)))
i <- Map(`+`, i, c(0, cumsum(head(n, -1))))
df <- df[unlist(i),]
rownames(df) <- NULL
df$ID <- rep(as.numeric(names(i)), each = max_n)
df
#> ID col1
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R
R Dplyr group_by
Try
weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())
if this doesn't work, please clarify the question by providing a representative data object.
How to label a double grouped data frame with a group number with group_indices in dplyr?
We can use dense_rank
.
library(dplyr)
db2 <- db %>%
group_by(ID) %>%
mutate(rank = dense_rank(date)) %>%
ungroup()
db2
# # A tibble: 10 x 3
# ID date rank
# <dbl> <date> <int>
# 1 1. 2001-01-01 1
# 2 1. 2001-01-01 1
# 3 1. 2001-01-01 1
# 4 1. 2001-01-03 2
# 5 1. 2001-01-03 2
# 6 2. 2011-01-01 3
# 7 2. 2011-01-01 3
# 8 2. 2010-03-12 2
# 9 2. 2010-03-12 2
# 10 2. 2001-01-01 1
Related Topics
Existing Function to Combine Standard Deviations in R
Geom_Smooth with Facet_Grid and Different Fitting Functions
What Does Na.Rm=True Actually Means
Aggregating Rows for Multiple Columns in R
Manually Defining The Colours of a Wireframe
How to Predict Survival Probabilities in R
Importing Many Files at The Same Time and Adding Id Indicator
Error in Dev.Off(): Cannot Shut Down Device 1 (The Null Device)
R Mlogit Model, Computationally Singular
Ggplot2 Equivalent of 'Factorization or Categorization' in Googlevis in R
The Fastest Way to Convert Numeric to Character in R
Flag First By-Group in R Data Frame
R - Column Names in Read.Table and Write.Table Starting with Number and Containing Space
How to Define a Function in Dplyr