How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By

How to give numbers to each group of a dataframe with dplyr::group_by?

Use mutate to add a column which is just a numeric form of from as a factor:

df %>% mutate(group_no = as.integer(factor(from)))

#   from dest group_no
# 1    a    b        1
# 2    a    c        1
# 3    b    d        2

Note group_by isn't necessary here, unless you're using it for other purposes. If you want to group by the new column for use later, you can use group_by instead of mutate to add the column.

How to number/label data-table by group-number from group_by?

Updated answer

get_group_number = function(){
    i = 0
    function(){
        i <<- i+1
        i
    }
}
group_number = get_group_number()
df %>% group_by(u,v) %>% mutate(label = group_number())

You can also consider the following slightly unreadable version

group_number = (function(){i = 0; function() i <<- i+1 })()
df %>% group_by(u,v) %>% mutate(label = group_number())

using iterators package

library(iterators)

counter = icount()
df %>% group_by(u,v) %>% mutate(label = nextElem(counter))

Using aggregate/group_by in R to group data and give a count for each factor variable?

With dpylr::count and tidyr::pivot_wider you could do:

library(dplyr)
library(tidyr)

telangiectasia_tumour_data %>% 
  count(Telangiectasia_time, grade) %>% 
  pivot_wider(names_from = grade, values_from = n, names_prefix = "grade", values_fill = 0)
#> # A tibble: 4 × 3
#>   Telangiectasia_time      grade0 grade1
#>   <chr>                     <int>  <int>
#> 1 telangiectasia_tumour_0       1      1
#> 2 telangiectasia_tumour_1       1      1
#> 3 telangiectasia_tumour_12      1      0
#> 4 telangiectasia_tumour_24      1      0

DATA

telangiectasia_tumour_data <- structure(list(Telangiectasia_time = c(
  "telangiectasia_tumour_0",
  "telangiectasia_tumour_1", "telangiectasia_tumour_12", "telangiectasia_tumour_24",
  "telangiectasia_tumour_0", "telangiectasia_tumour_1"
), grade = c(
  0L,
  0L, 0L, 0L, 1L, 1L
)), class = "data.frame", row.names = c(
  "1",
  "2", "3", "4", "5", "6"
))

Adding Id number using group_by() in R

You can do this without group_by.

df %>%
  mutate(ID = match(grouping, unique(grouping)))
#>   evaluation grouping ID
#> 1        1.0        a  1
#> 2        0.5        a  1
#> 3        2.0        b  2
#> 4        1.0        b  2
#> 5        2.0        b  2
#> 6        0.5        c  3

R: Create numbering within each group

A) The dplyr package offers group_indices() for adding unique group indentifiers:

library(dplyr)

df$number <- df %>% 
  group_indices(ID)
df

# A tibble: 10 × 3
   study    ID number
   <chr> <dbl>  <int>
 1 A         1      1
 2 B         1      1
 3 C         1      1
 4 A         5      2
 5 B         5      2
...

B) You can drop observations where the group size is less than 3 (i.e., "A", "B" and "C") with filter():

df %>% 
  group_by(ID) %>% 
  filter(n() == 3)

# A tibble: 6 × 3
# Groups:   ID [2]
  study    ID number
  <chr> <dbl>  <int>
1 A         1      1
2 B         1      1
3 C         1      1
4 A         7      3
5 B         7      3
6 C         7      3

Numbering rows within groups in a data frame

Use ave, ddply, dplyr or data.table:

df$num <- ave(df$val, df$cat, FUN = seq_along)

or:

library(plyr)
ddply(df, .(cat), mutate, id = seq_along(val))

or:

library(dplyr)
df %>% group_by(cat) %>% mutate(id = row_number())

or (the most memory efficient, as it assigns by reference within DT):

library(data.table)
DT <- data.table(df)

DT[, id := seq_len(.N), by = cat]
DT[, id := rowid(cat)]

How to add a row to each group and assign values

According to the documentation of the function group_modify, if you use a formula, you must use ". or .x to refer to the subset of rows of .tbl for the given group;" that's why you used .x inside the add_row function. To be entirely consistent, you have to do it also within the first function.

df %>% 
  group_by(id) %>% 
  group_modify(~ add_row(A=4, B=first(.x$B), .x))

# A tibble: 6 x 3
# Groups:   id [3]
  id        A     B
  <chr> <dbl> <dbl>
1 one       1     4
2 one       4     4
3 three     3     6
4 three     4     6
5 two       2     5
6 two       4     5

Using first(.$B) or first(df$B) will provide the same results.

Expand each group to the max n of rows

You can take advantage of the fact that df[n_bigger_than_nrow,] gives a row of NAs

dplyr

max_n <- max(count(df, ID)$n)

df %>% 
  group_by(ID) %>% 
  summarise(cur_data()[seq(max_n),])
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups`
#> argument.
#> # A tibble: 9 × 2
#> # Groups:   ID [3]
#>      ID col1 
#>   <int> <chr>
#> 1     1 A    
#> 2     1 B    
#> 3     1 <NA> 
#> 4     2 O    
#> 5     2 <NA> 
#> 6     2 <NA> 
#> 7     3 U    
#> 8     3 L    
#> 9     3 R

base R

n <- tapply(df$ID, df$ID, length)
max_n <- max(n)
i <- lapply(n, \(x) c(seq(x), rep(Inf, max_n - x)))
i <- Map(`+`, i, c(0, cumsum(head(n, -1))))
df <- df[unlist(i),]
rownames(df) <- NULL
df$ID <- rep(as.numeric(names(i)), each = max_n)

df
#>   ID col1
#> 1  1    A
#> 2  1    B
#> 3  1 <NA>
#> 4  2    O
#> 5  2 <NA>
#> 6  2 <NA>
#> 7  3    U
#> 8  3    L
#> 9  3    R

R Dplyr group_by

Try

weight_by_economy <- data %>%
                 group_by(country) %>%
                 summarize(weight = country_population/n())

if this doesn't work, please clarify the question by providing a representative data object.

How to label a double grouped data frame with a group number with group_indices in dplyr?

We can use dense_rank.

library(dplyr)

db2 <- db %>%
  group_by(ID) %>%
  mutate(rank = dense_rank(date)) %>%
  ungroup()
db2
# # A tibble: 10 x 3
#      ID date        rank
#   <dbl> <date>     <int>
#  1    1. 2001-01-01     1
#  2    1. 2001-01-01     1
#  3    1. 2001-01-01     1
#  4    1. 2001-01-03     2
#  5    1. 2001-01-03     2
#  6    2. 2011-01-01     3
#  7    2. 2011-01-01     3
#  8    2. 2010-03-12     2
#  9    2. 2010-03-12     2
# 10    2. 2001-01-01     1

How to Give Numbers to Each Group of a Dataframe with Dplyr::Group_By