Summarize all group values and a conditional subset in the same call
Writing up @hadley's comment as an answer
df_sqlite %>%
group_by(ID) %>%
mutate(Bfoo = if(A=="foo") B else 0) %>%
summarize(sumB = sum(B),
sumBfoo = sum(Bfoo)) %>%
collect
Using dplyr summarise with conditions
We could keep the all(Status)
as second argument in summarise
(or change the column name) and also, it can be done with if/else
as the logic seems to return a single TRUE/FALSE based on whether all
of the 'Status' is TRUE or not
df %>%
group_by(ID) %>%
summarise( Test = if(all(Status)) first(Price[Status]) else
first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
# ID Test Status
# <dbl> <dbl> <lgl>
#1 1 5 FALSE
#2 2 0 TRUE
#3 3 7 FALSE
NOTE: It is better not to use ifelse
with unequal lengths for its arguments
dplyr summarise based on order condition with if statement
Here's a dplyr solution:
df %>%
group_by(id) %>%
mutate(ymean = mean(y), zmean = mean(z),
pref = 3 * types %in% preference_3rd +
2 * types %in% preference_2nd +
1 * types %in% preference_1st ) %>%
filter(pref == min(pref)) %>%
summarise(sumtest = sum(x), ymean = first(ymean), zmean = first(zmean))
#> # A tibble: 5 x 4
#> id sumtest ymean zmean
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 60 3.5 3.5
#> 2 2 25 8 8
#> 3 3 40 11.5 11.5
#> 4 4 10 14 14
#> 5 5 10 15 15
Write function to perform conditional summarize in R using named list
There are two operations done and one of them can be dynamically calculated
library(dplyr)
df %>%
mutate(total2 = sum(to_summarize[ID1 == filters[['ID1']]])) %>%
filter(across(starts_with("ID"), ~ . ==
filters[[cur_column()]])) %>%
summarise(total1 = sum(to_summarize),total2 = first(total2))
-output
# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12
If we want to do this without filter
, then reduce
the across
output to a single logical vector
to subset
library(purrr)
df %>%
summarise(total1 = sum(to_summarize[across(starts_with('ID'),
~ . == filters[[cur_column()]]) %>%
reduce(`&`)]),
total2 = sum(to_summarize[ID1 == filters[['ID1']]]))
-ouptut
# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12
group by and conditional summarize in R
We can do a double grouping
library(dplyr)
df %>%
group_by(vote) %>%
summarise(val=sum(val)) %>%
group_by(vote = replace(vote, val <2, 'unpop')) %>%
summarise(val = sum(val))
-output
# A tibble: 3 x 2
# vote val
# <chr> <dbl>
#1 A 3
#2 B 6
#3 unpop 2
Or another option with rowsum
df %>%
group_by(vote = replace(vote, vote %in%
names(which((rowsum(val, vote) < 2)[,1])), 'unpopular')) %>%
summarise(val = sum(val))
Or using fct_lump_n
from forcats
library(forcats)
df %>%
group_by(vote = fct_lump_n(vote, 2, other_level = "unpop")) %>%
summarise(val = sum(val))
# A tibble: 3 x 2
# vote val
# <fct> <dbl>
#1 A 3
#2 B 6
#3 unpop 2
Or using table
df %>%
group_by(vote = replace(vote,
vote %in% names(which(table(vote) < 2)), 'unpop')) %>%
summarise(val = sum(val))
Related Topics
Adding a New Column Based Upon Values in Another Column Using Dplyr
Multi-Row X-Axis Labels in Ggplot Line Chart
R - Getting Characters After Symbol
Removing Columns That Are All 0
Quickly Reading Very Large Tables as Dataframes
Summarizing Multiple Columns With Dplyr
Data.Table VS Dplyr: Can One Do Something Well the Other Can't or Does Poorly
How to Escape Backslashes in R String
General Suggestions For Debugging in R
How to Drop Columns by Name in a Data Frame
Divide All Columns by the Value from the 2Nd Column - Apply for All Rows
Delete Rows Containing Specific Strings in R
Error in Confusion Matrix:The Data and Reference Factors Must Have the Same Number of Levels
Coerce Multiple Columns to Factors At Once
Faster Ways to Calculate Frequencies and Cast from Long to Wide
Unique Combination of All Elements from Two (Or More) Vectors
How to Set Limits For Axes in Ggplot2 R Plots
How to Combine Multiple Conditions to Subset a Data-Frame Using "Or"