﻿ Dplyr Conditional Summarise Function - ITCodar

# Dplyr Conditional Summarise Function

## Summarize all group values and a conditional subset in the same call

``df_sqlite %>%  group_by(ID) %>%  mutate(Bfoo = if(A=="foo") B else 0) %>%  summarize(sumB = sum(B),            sumBfoo = sum(Bfoo)) %>%  collect``

## Using dplyr summarise with conditions

We could keep the `all(Status)` as second argument in `summarise` (or change the column name) and also, it can be done with `if/else` as the logic seems to return a single TRUE/FALSE based on whether `all` of the 'Status' is TRUE or not

``df %>%   group_by(ID) %>%    summarise( Test = if(all(Status)) first(Price[Status]) else                    first(Price[!Status]), Status = all(Status))# A tibble: 3 x 3#     ID  Test Status#   <dbl> <dbl> <lgl> #1     1     5 FALSE #2     2     0 TRUE  #3     3     7 FALSE ``

NOTE: It is better not to use `ifelse` with unequal lengths for its arguments

## dplyr summarise based on order condition with if statement

Here's a dplyr solution:

``df %>%   group_by(id) %>%  mutate(ymean = mean(y), zmean = mean(z),          pref = 3 * types %in% preference_3rd +                 2 * types %in% preference_2nd +                1 * types %in% preference_1st ) %>%  filter(pref == min(pref)) %>%  summarise(sumtest = sum(x), ymean = first(ymean), zmean = first(zmean))#> # A tibble: 5 x 4#>      id sumtest ymean zmean#>   <dbl>   <dbl> <dbl> <dbl>#> 1     1      60   3.5   3.5#> 2     2      25   8     8  #> 3     3      40  11.5  11.5#> 4     4      10  14    14  #> 5     5      10  15    15 ``

## Write function to perform conditional summarize in R using named list

There are two operations done and one of them can be dynamically calculated

``library(dplyr)df %>%    mutate(total2 = sum(to_summarize[ID1 == filters[['ID1']]])) %>%     filter(across(starts_with("ID"), ~ . ==                 filters[[cur_column()]])) %>%    summarise(total1 = sum(to_summarize),total2 = first(total2))``

-output

``# A tibble: 1 x 2  total1 total2   <dbl>  <dbl>1     10     12``

If we want to do this without `filter`, then `reduce` the `across` output to a single logical `vector` to `subset`

``library(purrr)df %>%   summarise(total1 = sum(to_summarize[across(starts_with('ID'),    ~ . == filters[[cur_column()]]) %>%             reduce(`&`)]),      total2 = sum(to_summarize[ID1 == filters[['ID1']]]))``

-ouptut

``# A tibble: 1 x 2  total1 total2   <dbl>  <dbl>1     10     12``

## group by and conditional summarize in R

We can do a double grouping

``library(dplyr)df %>%     group_by(vote) %>%     summarise(val=sum(val)) %>%    group_by(vote = replace(vote, val <2, 'unpop')) %>%     summarise(val = sum(val))``

-output

``# A tibble: 3 x 2# vote    val#  <chr> <dbl>#1 A         3#2 B         6#3 unpop     2``

Or another option with `rowsum`

``df %>%    group_by(vote = replace(vote, vote %in%      names(which((rowsum(val, vote) < 2)[,1])), 'unpopular')) %>%    summarise(val = sum(val))``

Or using `fct_lump_n` from `forcats`

``library(forcats)df %>%   group_by(vote = fct_lump_n(vote, 2, other_level = "unpop")) %>%  summarise(val = sum(val))# A tibble: 3 x 2#  vote    val#  <fct> <dbl>#1 A         3#2 B         6#3 unpop     2``

Or using `table`

``df %>%   group_by(vote = replace(vote,       vote %in% names(which(table(vote) < 2)), 'unpop'))  %>%   summarise(val = sum(val))``