How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By

How to give numbers to each group of a dataframe with dplyr::group_by?

Use mutate to add a column which is just a numeric form of from as a factor:

df %>% mutate(group_no = as.integer(factor(from)))

# from dest group_no
# 1 a b 1
# 2 a c 1
# 3 b d 2

Note group_by isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by instead of mutate to add the column.

How to number/label data-table by group-number from group_by?

Updated answer

get_group_number = function(){
i = 0
function(){
i <<- i+1
i
}
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())

You can also consider the following slightly unreadable version

group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())

using iterators package

library(iterators)

counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))

Using aggregate/group_by in R to group data and give a count for each factor variable?

With dpylr::count and tidyr::pivot_wider you could do:

library(dplyr)
library(tidyr)

telangiectasia_tumour_data %>%
count(Telangiectasia_time, grade) %>%
pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#> Telangiectasia_time grade0 grade1
#> <chr> <int> <int>
#> 1 telangiectasia_tumour_0 1 1
#> 2 telangiectasia_tumour_1 1 1
#> 3 telangiectasia_tumour_12 1 0
#> 4 telangiectasia_tumour_24 1 0

DATA

telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
"telangiectasia_tumour_0",
"telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
"telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
0L,
0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
"1",
"2", "3", "4", "5", "6"
))

Adding Id number using group_by() in R

You can do this without group_by.

df %>%
mutate(ID = match(grouping, unique(grouping)))
#> evaluation grouping ID
#> 1 1.0 a 1
#> 2 0.5 a 1
#> 3 2.0 b 2
#> 4 1.0 b 2
#> 5 2.0 b 2
#> 6 0.5 c 3

R: Create numbering within each group

A) The dplyr package offers group_indices() for adding unique group indentifiers:

library(dplyr)

df$number <- df %>%
group_indices(ID)
df

# A tibble: 10 × 3
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 5 2
5 B 5 2
...

B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter():

df %>% 
group_by(ID) %>%
filter(n() == 3)

# A tibble: 6 × 3
# Groups: ID [2]
study ID number
<chr> <dbl> <int>
1 A 1 1
2 B 1 1
3 C 1 1
4 A 7 3
5 B 7 3
6 C 7 3

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

How to add a row to each group and assign values

According to the documentation of the function group_modify, if you use a formula, you must use ". or .x to refer to the subset of rows of .tbl for the given group;" that's why you used .x inside the add_row function. To be entirely consistent, you have to do it also within the first function.

df %>% 
group_by(id) %>%
group_modify(~ add_row(A=4, B=first(.x$B), .x))

# A tibble: 6 x 3
# Groups: id [3]
id A B
<chr> <dbl> <dbl>
1 one 1 4
2 one 4 4
3 three 3 6
4 three 4 6
5 two 2 5
6 two 4 5

Using first(.$B) or first(df$B) will provide the same results.

Expand each group to the max n of rows

You can take advantage of the fact that df[n_bigger_than_nrow,] gives a row of NAs

dplyr

max_n <- max(count(df, ID)$n)

df %>%
group_by(ID) %>%
summarise(cur_data()[seq(max_n),])
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 9 × 2
#> # Groups: ID [3]
#> ID col1
#> <int> <chr>
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R

base R

n <- tapply(df$ID, df$ID, length)
max_n <- max(n)
i <- lapply(n, \(x) c(seq(x), rep(Inf, max_n - x)))
i <- Map(`+`, i, c(0, cumsum(head(n, -1))))
df <- df[unlist(i),]
rownames(df) <- NULL
df$ID <- rep(as.numeric(names(i)), each = max_n)

df
#> ID col1
#> 1 1 A
#> 2 1 B
#> 3 1 <NA>
#> 4 2 O
#> 5 2 <NA>
#> 6 2 <NA>
#> 7 3 U
#> 8 3 L
#> 9 3 R

R Dplyr group_by

Try

weight_by_economy <- data %>%
group_by(country) %>%
summarize(weight = country_population/n())

if this doesn't work, please clarify the question by providing a representative data object.

How to label a double grouped data frame with a group number with group_indices in dplyr?

We can use dense_rank.

library(dplyr)

db2 <- db %>%
group_by(ID) %>%
mutate(rank = dense_rank(date)) %>%
ungroup()
db2
# # A tibble: 10 x 3
# ID date rank
# <dbl> <date> <int>
# 1 1. 2001-01-01 1
# 2 1. 2001-01-01 1
# 3 1. 2001-01-01 1
# 4 1. 2001-01-03 2
# 5 1. 2001-01-03 2
# 6 2. 2011-01-01 3
# 7 2. 2011-01-01 3
# 8 2. 2010-03-12 2
# 9 2. 2010-03-12 2
# 10 2. 2001-01-01 1


Related Topics



Leave a reply



Submit