Add Rows to Grouped Data with Dplyr

Can't add rows to grouped data frames

I actually recently made a little helper function for exactly this. The idea
is to use group_modify() to take the group data, and
bind_rows() the summary statistics calculated with summarise().

This is what it looks like in code:

add_summary_rows <- function(.data, ...) {
group_modify(.data, function(x, y) bind_rows(x, summarise(x, ...)))
}

And here’s how that would work with your data:

library(dplyr, warn.conflicts = FALSE)

df <- data.frame(
test_id = c(1, 1, 1, 1, 1, 1, 1, 1),
test_nr = c(1, 1, 1, 1, 2, 2, 2, 2),
region = c("A", "B", "C", "D", "A", "B", "C", "D"),
test_value = c(3, 1, 1, 2, 4, 2, 4, 1)
)

df %>%
group_by(test_id, test_nr) %>%
add_summary_rows(
region = "MEAN",
test_value = mean(test_value)
)
#> # A tibble: 10 x 4
#> # Groups: test_id, test_nr [2]
#> test_id test_nr region test_value
#> <dbl> <dbl> <chr> <dbl>
#> 1 1 1 A 3
#> 2 1 1 B 1
#> 3 1 1 C 1
#> 4 1 1 D 2
#> 5 1 1 MEAN 1.75
#> 6 1 2 A 4
#> 7 1 2 B 2
#> 8 1 2 C 4
#> 9 1 2 D 1
#> 10 1 2 MEAN 2.75

Add rows to grouped data with dplyr?

Without dplyr it can be done like this:

as.data.frame(xtabs(Demand ~ Week + Article, data))

giving:

       Week Article Freq
1 2013-W01 10004 1215
2 2013-W02 10004 900
3 2013-W03 10004 774
4 2013-W04 10004 1170
5 2013-W01 10006 0
6 2013-W02 10006 0
7 2013-W03 10006 0
8 2013-W04 10006 5
9 2013-W01 10007 2
10 2013-W02 10007 0
11 2013-W03 10007 0
12 2013-W04 10007 0

and this can be rewritten as a magrittr or dplyr pipeline like this:

data %>% xtabs(formula = Demand ~ Week + Article) %>% as.data.frame()

The as.data.frame() at the end could be omitted if a wide form solution were desired.

Add row in each group using dplyr and add_row()

If you want to use a grouped operation, you need do like JasonWang described in his comment, as other functions like mutate or summarise expect a result with the same number of rows as the grouped data frame (in your case, 50) or with one row (e.g. when summarising).

As you probably know, in general do can be slow and should be a last resort if you cannot achieve your result in another way. Your task is quite simple because it only involves adding extra rows in your data frame, which can be done by simple indexing, e.g. look at the output of iris[NA, ].

What you want is essentially to create a vector

indices <- c(NA, 1:50, NA, 51:100, NA, 101:150)

(since the first group is in rows 1 to 50, the second one in 51 to 100 and the third one in 101 to 150).

The result is then iris[indices, ].

A more general way of building this vector uses group_indices.

indices <- seq(nrow(iris)) %>% 
split(group_indices(iris, Species)) %>%
map(~c(NA, .x)) %>%
unlist

(map comes from purrr which I assume you have loaded as you have tagged this with tidyverse).

How to add a row to each group and assign values

According to the documentation of the function group_modify, if you use a formula, you must use ". or .x to refer to the subset of rows of .tbl for the given group;" that's why you used .x inside the add_row function. To be entirely consistent, you have to do it also within the first function.

df %>% 
group_by(id) %>%
group_modify(~ add_row(A=4, B=first(.x$B), .x))

# A tibble: 6 x 3
# Groups: id [3]
id A B
<chr> <dbl> <dbl>
1 one 1 4
2 one 4 4
3 three 3 6
4 three 4 6
5 two 2 5
6 two 4 5

Using first(.$B) or first(df$B) will provide the same results.

R add rows to grouped df using dplyr

This should do the trick:

 library(plyr)

df %>%
join(subset(df, item_code %in% additional_rows$item_code, select = c(id, item_code)) %>%
join(additional_rows) %>%
subset(!duplicated(.)), type = "full") %>%
arrange(id, item_code, -score)

Not sure if its the best way, but it works

Edit: to get the score in the same order added the other arrange terms

Edit 2: alright, there should now be no duplicated rows added from the additional rows as per your comment

Add rows by group and fill them with zero in R with dplyr

We can use complete

library(dplyr)
library(tidyr)
df %>%
complete(gene, time = 1:4, fill = list(frequency = 0)) %>%
select(names(df))

-output

# A tibble: 8 x 3
gene frequency time
<chr> <dbl> <dbl>
1 A 0.590 1
2 A 0.762 2
3 A 0.336 3
4 A 0.437 4
5 B 0.904 1
6 B 1.97 2
7 B 0 3
8 B 0 4

R Add rows to each group so each group has same number, and specify other variable

tidyr::complete(df, week, session)

# A tibble: 16 x 3
week session work
<dbl> <dbl> <chr>
1 1 1 done
2 1 2 done
3 1 3 NA
4 1 4 NA
5 2 1 done
6 2 2 done
7 2 3 NA
8 2 4 NA
9 3 1 done
10 3 2 done
11 3 3 done
12 3 4 NA
13 4 1 done
14 4 2 done
15 4 3 done
16 4 4 done

Insert new row on group_by data in R dplyr based on condition

You have almost achieved what you want.

new_rows <- example %>%
group_by(bucket) %>%
summarise(rate = 1 - sum(rate))

new_rows

# bucket rate
# <dbl> <dbl>
# 1 0 0.015
# 2 1 0.02

bind_rows(example, new_rows)

# bucket bucket2 rate
# 1 0 0 0.950
# 2 0 1 0.020
# 3 0 2 0.010
# 4 0 3 0.005
# 5 0 4 0.000
# 6 1 0 0.900
# 7 1 1 0.050
# 8 1 2 0.020
# 9 1 3 0.010
# 10 1 4 0.000
# 11 0 NA 0.015
# 12 1 NA 0.020

adding rows by group to get same number of observations by group

We may group by 'anon_ID' and use complete to expand the data

library(dplyr)
library(tidyr)
df1 %>
group_by(anon_ID) %>%
complete(nth_assistance_interaction = c(5, 10, 15, 20)) %>%
ungroup


Related Topics



Leave a reply



Submit