## Applying group_by and summarise on data while keeping all the columns' info

Here are two options using a) `filter`

and b) `slice`

from dplyr. In this case there are no duplicated minimum values in column `c`

for any of the groups and so the results of a) and b) are the same. If there *were* duplicated minima, approach a) would return each minima per group while b) would only return one minimum (the first) in each group.

**a)**

`> data %>% group_by(b) %>% filter(c == min(c))`

#Source: local data frame [4 x 4]

#Groups: b

#

# a b c d

#1 1 a 1.2 small

#2 4 b 1.7 larg

#3 6 c 3.1 med

#4 10 d 2.2 med

Or similarly

`> data %>% group_by(b) %>% filter(min_rank(c) == 1L)`

#Source: local data frame [4 x 4]

#Groups: b

#

# a b c d

#1 1 a 1.2 small

#2 4 b 1.7 larg

#3 6 c 3.1 med

#4 10 d 2.2 med

**b)**

`> data %>% group_by(b) %>% slice(which.min(c))`

#Source: local data frame [4 x 4]

#Groups: b

#

# a b c d

#1 1 a 1.2 small

#2 4 b 1.7 larg

#3 6 c 3.1 med

#4 10 d 2.2 med

## How can I keep columns when grouping/summarizing?

You can do this using `base R`

`aggregate(data=df1,B~.,FUN = mean)`

## Grouping and summarizing by keeping other columns in R

Try

`summarize(MM_group, `

rank = which.max(Yield),

Year_rank = Year[rank],

County_rank = County[rank])

## Applying group_by and summarise(sum) but keep columns with non-relevant conflicting data?

Here's the `data.table`

solution, I'm assuming you want the `mean()`

of Proportion, since these grouped proportions are likely not additive.

`setDT(df)`

df[, .(Type =paste(Type,collapse="_"),

Proportion=mean(Proportion),N= sum(N),C=sum(C)), by=.(Label,Code)]

[order(Label)]

Label Code Type Proportion N C

1: 203c c wholefish 1.000000 1 1

2: 203c a flesh 1.000000 2 2

3: 204a a flesh_formula 0.499995 8 8

4: 204a b fleshdelip_formuladelip 0.499995 10 10

5: 204a c formula_wholefish 0.499995 16 16

6: 204a d formuladelip_wholefishdelip 0.499995 18 18

I'm not sure this is the cleanest `dplyr`

solution, but it works:

`df %>% group_by(Label, Code) %>% `

mutate(Type = paste(Type,collapse="_")) %>%

group_by(Label,Type,Code) %>%

summarise(N=sum(N),C=sum(C),Proportion=mean(Proportion))

Note the key here is to re-group once you create the combined `Type`

column.

` Label Type Code N C Proportion`

<fctr> <chr> <fctr> <int> <int> <dbl>

1 203c flesh a 2 2 1.000000

2 203c wholefish c 1 1 1.000000

3 204a flesh_formula a 8 8 0.499995

4 204a fleshdelip_formuladelip b 10 10 0.499995

5 204a formula_wholefish c 16 16 0.499995

6 204a formuladelip_wholefishdelip d 18 18 0.499995

## Applying group_by and summarise(sum) but keep a large number of additional columns

We can create a column with `mutate`

and then apply `distinct`

`library(dplyr)`

df %>%

group_by(location) %>%

mutate(count = sum(count)) %>% select(-date) %>%

distinct(location, important_1, important_30, .keep_all = TRUE)

If there are multiple column names, we can also use `syms`

to convert to `symbol`

and evaluate (`!!!`

)

`df %>% `

group_by(location) %>%

mutate(count = sum(count)) %>% select(-date) %>%

distinct(location, !!! rlang::syms(names(.)[startsWith(names(.), 'important')]), .keep_all = TRUE)

## keep columns after summarising using tidyverse in R

We can use `slice_max`

to return the full row based on the `max`

value of 'year' for each grouping block

`library(dplyr)`

dat %>%

group_by(group, month) %>%

slice_max(year)

