How to Keep Columns When Grouping/Summarizing

Applying group_by and summarise on data while keeping all the columns' info

Here are two options using a) filter and b) slice from dplyr. In this case there are no duplicated minimum values in column c for any of the groups and so the results of a) and b) are the same. If there were duplicated minima, approach a) would return each minima per group while b) would only return one minimum (the first) in each group.

a)

> data %>% group_by(b) %>% filter(c == min(c))
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med

Or similarly

> data %>% group_by(b) %>% filter(min_rank(c) == 1L)
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med

b)

> data %>% group_by(b) %>% slice(which.min(c))
#Source: local data frame [4 x 4]
#Groups: b
#
# a b c d
#1 1 a 1.2 small
#2 4 b 1.7 larg
#3 6 c 3.1 med
#4 10 d 2.2 med

How can I keep columns when grouping/summarizing?

You can do this using base R

aggregate(data=df1,B~.,FUN = mean)

Grouping and summarizing by keeping other columns in R

Try

summarize(MM_group, 
rank = which.max(Yield),
Year_rank = Year[rank],
County_rank = County[rank])

Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

Here's the data.table solution, I'm assuming you want the mean() of Proportion, since these grouped proportions are likely not additive.

setDT(df)

df[, .(Type =paste(Type,collapse="_"),
Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]
[order(Label)]

Label Code Type Proportion N C
1: 203c c wholefish 1.000000 1 1
2: 203c a flesh 1.000000 2 2
3: 204a a flesh_formula 0.499995 8 8
4: 204a b fleshdelip_formuladelip 0.499995 10 10
5: 204a c formula_wholefish 0.499995 16 16
6: 204a d formuladelip_wholefishdelip 0.499995 18 18

I'm not sure this is the cleanest dplyr solution, but it works:

df %>% group_by(Label, Code) %>% 
mutate(Type = paste(Type,collapse="_")) %>%
group_by(Label,Type,Code) %>%
summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))

Note the key here is to re-group once you create the combined Type column.

   Label                        Type   Code     N     C Proportion
<fctr> <chr> <fctr> <int> <int> <dbl>
1 203c flesh a 2 2 1.000000
2 203c wholefish c 1 1 1.000000
3 204a flesh_formula a 8 8 0.499995
4 204a fleshdelip_formuladelip b 10 10 0.499995
5 204a formula_wholefish c 16 16 0.499995
6 204a formuladelip_wholefishdelip d 18 18 0.499995

Applying group_by and summarise(sum) but keep a large number of additional columns

We can create a column with mutate and then apply distinct

library(dplyr)
df %>%
group_by(location) %>%
mutate(count = sum(count)) %>% select(-date) %>%
distinct(location, important_1, important_30, .keep_all = TRUE)

If there are multiple column names, we can also use syms to convert to symbol and evaluate (!!!)

df %>% 
group_by(location) %>%
mutate(count = sum(count)) %>% select(-date) %>%
distinct(location, !!! rlang::syms(names(.)[startsWith(names(.), 'important')]), .keep_all = TRUE)

keep columns after summarising using tidyverse in R

We can use slice_max to return the full row based on the max value of 'year' for each grouping block

library(dplyr)
dat %>%
group_by(group, month) %>%
slice_max(year)


Related Topics



Leave a reply



Submit