Add Rows to Grouped Data with Dplyr

Can't add rows to grouped data frames

I actually recently made a little helper function for exactly this. The idea
is to use group_modify() to take the group data, and
bind_rows() the summary statistics calculated with summarise().

This is what it looks like in code:

add_summary_rows <- function(.data, ...) {
  group_modify(.data, function(x, y) bind_rows(x, summarise(x, ...)))
}

And here’s how that would work with your data:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
  test_id = c(1, 1, 1, 1, 1, 1, 1, 1),
  test_nr = c(1, 1, 1, 1, 2, 2, 2, 2),
  region = c("A", "B", "C", "D", "A", "B", "C", "D"),
  test_value = c(3, 1, 1, 2, 4, 2, 4, 1)
)

df %>% 
  group_by(test_id, test_nr) %>% 
  add_summary_rows(
    region = "MEAN",
    test_value = mean(test_value)
  )
#> # A tibble: 10 x 4
#> # Groups:   test_id, test_nr [2]
#>    test_id test_nr region test_value
#>      <dbl>   <dbl> <chr>       <dbl>
#>  1       1       1 A            3   
#>  2       1       1 B            1   
#>  3       1       1 C            1   
#>  4       1       1 D            2   
#>  5       1       1 MEAN         1.75
#>  6       1       2 A            4   
#>  7       1       2 B            2   
#>  8       1       2 C            4   
#>  9       1       2 D            1   
#> 10       1       2 MEAN         2.75

Add rows to grouped data with dplyr?

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1  2013-W01   10004 1215
2  2013-W02   10004  900
3  2013-W03   10004  774
4  2013-W04   10004 1170
5  2013-W01   10006    0
6  2013-W02   10006    0
7  2013-W03   10006    0
8  2013-W04   10006    5
9  2013-W01   10007    2
10 2013-W02   10007    0
11 2013-W03   10007    0
12 2013-W04   10007    0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

Add row in each group using dplyr and add_row()

If you want to use a grouped operation, you need do like JasonWang described in his comment, as other functions like mutate or summarise expect a result with the same number of rows as the grouped data frame (in your case, 50) or with one row (e.g. when summarising).

As you probably know, in general do can be slow and should be a last resort if you cannot achieve your result in another way. Your task is quite simple because it only involves adding extra rows in your data frame, which can be done by simple indexing, e.g. look at the output of iris[NA, ].

What you want is essentially to create a vector

indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)

(since the first group is in rows 1 to 50, the second one in 51 to 100 and the third one in 101 to 150).

The result is then iris[indices, ].

A more general way of building this vector uses group_indices.

indices <- seq(nrow(iris)) %>% 
    split(group_indices(iris, Species)) %>% 
    map(~c(NA, .x)) %>%
    unlist

(map comes from purrr which I assume you have loaded as you have tagged this with tidyverse).

How to add a row to each group and assign values

According to the documentation of the function group_modify, if you use a formula, you must use ". or .x to refer to the subset of rows of .tbl for the given group;" that's why you used .x inside the add_row function. To be entirely consistent, you have to do it also within the first function.

df %>% 
  group_by(id) %>% 
  group_modify(~ add_row(A=4, B=first(.x$B), .x))

# A tibble: 6 x 3
# Groups:   id [3]
  id        A     B
  <chr> <dbl> <dbl>
1 one       1     4
2 one       4     4
3 three     3     6
4 three     4     6
5 two       2     5
6 two       4     5

Using first(.$B) or first(df$B) will provide the same results.

R add rows to grouped df using dplyr

This should do the trick:

 library(plyr)

 df %>%
   join(subset(df, item_code %in% additional_rows$item_code, select = c(id, item_code)) %>%
        join(additional_rows) %>% 
        subset(!duplicated(.)), type = "full") %>%
   arrange(id, item_code, -score)

Not sure if its the best way, but it works

Edit: to get the score in the same order added the other arrange terms

Edit 2: alright, there should now be no duplicated rows added from the additional rows as per your comment

Add rows by group and fill them with zero in R with dplyr

We can use complete

library(dplyr)
library(tidyr)
df %>% 
    complete(gene, time = 1:4, fill = list(frequency = 0)) %>%
    select(names(df))

-output

# A tibble: 8 x 3
  gene  frequency  time
  <chr>     <dbl> <dbl>
1 A         0.590     1
2 A         0.762     2
3 A         0.336     3
4 A         0.437     4
5 B         0.904     1
6 B         1.97      2
7 B         0         3
8 B         0         4

R Add rows to each group so each group has same number, and specify other variable

tidyr::complete(df, week, session)

# A tibble: 16 x 3
    week session work 
   <dbl>   <dbl> <chr>
 1     1       1 done 
 2     1       2 done 
 3     1       3 NA   
 4     1       4 NA   
 5     2       1 done 
 6     2       2 done 
 7     2       3 NA   
 8     2       4 NA   
 9     3       1 done 
10     3       2 done 
11     3       3 done 
12     3       4 NA   
13     4       1 done 
14     4       2 done 
15     4       3 done 
16     4       4 done

Insert new row on group_by data in R dplyr based on condition

You have almost achieved what you want.

new_rows <- example %>%
            group_by(bucket) %>%
            summarise(rate = 1 - sum(rate))

new_rows

#   bucket  rate
#    <dbl> <dbl>
# 1      0 0.015
# 2      1 0.02

bind_rows(example, new_rows)

#    bucket bucket2  rate
# 1       0       0 0.950
# 2       0       1 0.020
# 3       0       2 0.010
# 4       0       3 0.005
# 5       0       4 0.000
# 6       1       0 0.900
# 7       1       1 0.050
# 8       1       2 0.020
# 9       1       3 0.010
# 10      1       4 0.000
# 11      0      NA 0.015
# 12      1      NA 0.020

adding rows by group to get same number of observations by group

We may group by 'anon_ID' and use complete to expand the data

library(dplyr)
library(tidyr)
df1 %> 
  group_by(anon_ID) %>% 
  complete(nth_assistance_interaction = c(5, 10, 15, 20)) %>% 
  ungroup