Dplyr Conditional Summarise Function

Summarize all group values and a conditional subset in the same call

Writing up @hadley's comment as an answer

df_sqlite %>%
group_by(ID) %>%
mutate(Bfoo = if(A=="foo") B else 0) %>%
summarize(sumB = sum(B),
sumBfoo = sum(Bfoo)) %>%
collect

Using dplyr summarise with conditions

We could keep the all(Status) as second argument in summarise (or change the column name) and also, it can be done with if/else as the logic seems to return a single TRUE/FALSE based on whether all of the 'Status' is TRUE or not

df %>%
group_by(ID) %>%
summarise( Test = if(all(Status)) first(Price[Status]) else
first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
# ID Test Status
# <dbl> <dbl> <lgl>
#1 1 5 FALSE
#2 2 0 TRUE
#3 3 7 FALSE

NOTE: It is better not to use ifelse with unequal lengths for its arguments

dplyr summarise based on order condition with if statement

Here's a dplyr solution:

df %>% 
group_by(id) %>%
mutate(ymean = mean(y), zmean = mean(z),
pref = 3 * types %in% preference_3rd +
2 * types %in% preference_2nd +
1 * types %in% preference_1st ) %>%
filter(pref == min(pref)) %>%
summarise(sumtest = sum(x), ymean = first(ymean), zmean = first(zmean))
#> # A tibble: 5 x 4
#> id sumtest ymean zmean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 60 3.5 3.5
#> 2 2 25 8 8
#> 3 3 40 11.5 11.5
#> 4 4 10 14 14
#> 5 5 10 15 15

Write function to perform conditional summarize in R using named list

There are two operations done and one of them can be dynamically calculated

library(dplyr)
df %>%
mutate(total2 = sum(to_summarize[ID1 == filters[['ID1']]])) %>%
filter(across(starts_with("ID"), ~ . ==
filters[[cur_column()]])) %>%
summarise(total1 = sum(to_summarize),total2 = first(total2))

-output

# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12

If we want to do this without filter, then reduce the across output to a single logical vector to subset

library(purrr)
df %>%
summarise(total1 = sum(to_summarize[across(starts_with('ID'),
~ . == filters[[cur_column()]]) %>%
reduce(`&`)]),
total2 = sum(to_summarize[ID1 == filters[['ID1']]]))

-ouptut

# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12

group by and conditional summarize in R

We can do a double grouping

library(dplyr)
df %>%
group_by(vote) %>%
summarise(val=sum(val)) %>%
group_by(vote = replace(vote, val <2, 'unpop')) %>%
summarise(val = sum(val))

-output

# A tibble: 3 x 2
# vote val
# <chr> <dbl>
#1 A 3
#2 B 6
#3 unpop 2

Or another option with rowsum

df %>% 
group_by(vote = replace(vote, vote %in%
names(which((rowsum(val, vote) < 2)[,1])), 'unpopular')) %>%
summarise(val = sum(val))

Or using fct_lump_n from forcats

library(forcats)
df %>%
group_by(vote = fct_lump_n(vote, 2, other_level = "unpop")) %>%
summarise(val = sum(val))
# A tibble: 3 x 2
# vote val
# <fct> <dbl>
#1 A 3
#2 B 6
#3 unpop 2

Or using table

df %>%
group_by(vote = replace(vote,
vote %in% names(which(table(vote) < 2)), 'unpop')) %>%
summarise(val = sum(val))


Related Topics



Leave a reply



Submit