﻿ How to Keep Columns When Grouping/Summarizing - ITCodar

# How to Keep Columns When Grouping/Summarizing

## Applying group_by and summarise on data while keeping all the columns' info

Here are two options using a) `filter` and b) `slice` from dplyr. In this case there are no duplicated minimum values in column `c` for any of the groups and so the results of a) and b) are the same. If there were duplicated minima, approach a) would return each minima per group while b) would only return one minimum (the first) in each group.

a)

``> data %>% group_by(b) %>% filter(c == min(c))#Source: local data frame [4 x 4]#Groups: b##   a b   c     d#1  1 a 1.2 small#2  4 b 1.7  larg#3  6 c 3.1   med#4 10 d 2.2   med``

Or similarly

``> data %>% group_by(b) %>% filter(min_rank(c) == 1L)#Source: local data frame [4 x 4]#Groups: b##   a b   c     d#1  1 a 1.2 small#2  4 b 1.7  larg#3  6 c 3.1   med#4 10 d 2.2   med``

b)

``> data %>% group_by(b) %>% slice(which.min(c))#Source: local data frame [4 x 4]#Groups: b##   a b   c     d#1  1 a 1.2 small#2  4 b 1.7  larg#3  6 c 3.1   med#4 10 d 2.2   med``

## How can I keep columns when grouping/summarizing?

You can do this using `base R`

``aggregate(data=df1,B~.,FUN = mean)``

## Grouping and summarizing by keeping other columns in R

Try

``summarize(MM_group,           rank = which.max(Yield),          Year_rank = Year[rank],          County_rank = County[rank])``

## Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

Here's the `data.table` solution, I'm assuming you want the `mean()` of Proportion, since these grouped proportions are likely not additive.

``setDT(df)df[, .(Type =paste(Type,collapse="_"),   Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]  [order(Label)]   Label Code                        Type Proportion  N  C1:  203c    c                   wholefish   1.000000  1  12:  203c    a                       flesh   1.000000  2  23:  204a    a               flesh_formula   0.499995  8  84:  204a    b     fleshdelip_formuladelip   0.499995 10 105:  204a    c           formula_wholefish   0.499995 16 166:  204a    d formuladelip_wholefishdelip   0.499995 18 18``

I'm not sure this is the cleanest `dplyr` solution, but it works:

``df %>% group_by(Label, Code) %>%   mutate(Type = paste(Type,collapse="_")) %>%   group_by(Label,Type,Code) %>%   summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))``

Note the key here is to re-group once you create the combined `Type` column.

``   Label                        Type   Code     N     C Proportion  <fctr>                       <chr> <fctr> <int> <int>      <dbl>1   203c                       flesh      a     2     2   1.0000002   203c                   wholefish      c     1     1   1.0000003   204a               flesh_formula      a     8     8   0.4999954   204a     fleshdelip_formuladelip      b    10    10   0.4999955   204a           formula_wholefish      c    16    16   0.4999956   204a formuladelip_wholefishdelip      d    18    18   0.499995``

## Applying group_by and summarise(sum) but keep a large number of additional columns

We can create a column with `mutate` and then apply `distinct`

``library(dplyr)df %>%    group_by(location) %>%    mutate(count = sum(count)) %>% select(-date) %>%    distinct(location,  important_1, important_30, .keep_all = TRUE)``

If there are multiple column names, we can also use `syms` to convert to `symbol` and evaluate (`!!!`)

``df %>%      group_by(location) %>%      mutate(count = sum(count)) %>% select(-date) %>%     distinct(location, !!! rlang::syms(names(.)[startsWith(names(.), 'important')]), .keep_all = TRUE)``

## keep columns after summarising using tidyverse in R

We can use `slice_max` to return the full row based on the `max` value of 'year' for each grouping block

``library(dplyr)dat %>%  group_by(group, month) %>%  slice_max(year)``

Submit