How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values

How to use Dplyr's Summarize and which() to lookup min/max values

You can use which.min and which.max to get the first value.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = Name[which.min(Age)], 
            maxAge = max(Age), maxAgeName = Name[which.max(Age)])

To get all values, use e.g. paste with an appropriate collapse argument.

data %>% group_by(Group) %>%
  summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "), 
            maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))

How to print a min and max values based on a condition - dplyR

Maybe this can help:

library(dplyr)
#Code
new <- df %>% group_by(g) %>% 
  mutate(x=ifelse(g=='A',min(x,na.rm = T),
                     ifelse(g=='B',max(x,na.rm = T)))) %>%
  summarise(x=unique(x))

Output:

# A tibble: 2 x 2
  g         x
  <chr> <dbl>
1 A         3
2 B         9

Dplyr group_by summarize keep min/max value for each column within group, depending on column suffix

An option would be to group by 'grp', get the max of columns that end with 'high' (column names), use that also as grouping column and get the min of columns that end with 'low'

library(dplyr)    
dat %>%
   group_by(grp) %>%  
   mutate_at(vars(ends_with('high')), max) %>% 
   group_by_at(vars(ends_with('high')), .add = TRUE) %>% 
   summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 4
# Groups:   grp, v1_high [2]
#  grp   v1_high v2_high v3_low
#  <fct>   <dbl>   <dbl>  <dbl>
#1 A       0.184   0.330 -0.305
#2 B       1.60    0.738  0.390

It would also work if there are no min columns

dat[-4] %>%
    group_by(grp) %>%  
    mutate_at(vars(ends_with('high')), max) %>% 
    group_by_at(vars(ends_with('high')), .add = TRUE) %>%   
    summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 3
# Groups:   grp, v1_high [2]
#  grp   v1_high v2_high
#  <fct>   <dbl>   <dbl>
#1 A       0.184   0.330
#2 B       1.60    0.738

Or another option is map2

library(purrr)
map2(list(min, max), list('low', 'high'), ~ 
      dat %>% 
         select(grp, ends_with(.y)) %>%
         group_by(grp) %>%
         summarise_all(.x)) %>% 
         reduce(inner_join, by = 'grp')

trying to group_by and then summarize max and min - running into error for unambiguous format

Embed everything in the same summarise call. Also, you should specify the format of your date in the format argument of as.Date when your data is not in the international date format.

dat %>% 
  mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>% 
  group_by(id, street) %>% 
  summarise(firstReportedDate = min(firstReportedDate),
            lastReportedDate = max(lastReportedDate))

output

# A tibble: 10 × 4
# Groups:   id [4]
   id    street                 firstReportedDate lastReportedDate
   <chr> <chr>                  <date>            <date>          
 1 1000  19703 Highway 59 N     2009-01-01        2011-01-01      
 2 1000  6714 Dorylee Ln        2004-03-05        2004-03-05      
 3 1000  Po Box 203             2017-09-22        2022-04-01      
 4 1431  3511 Forest Row Dr     2009-09-30        2022-04-01      
 5 1431  Acorn Ln               2009-01-01        2013-01-01      
 6 357   1040 Marina Dr         2015-01-01        2021-01-01      
 7 357   2200 Lake Village Dr   2017-09-30        2022-04-01      
 8 359   1060 Marina Dr         2009-09-30        2009-09-30      
 9 359   22302 Rustic Bridge Ln 2017-06-15        2022-04-01      
10 359   3211 Laurel Point Ct   2002-10-12        2004-03-03

How to calculate mean , min, and max across when grouping using dplyr?

You can try something like this:

   library(dplyr)
   df %>% 
   group_by(ID) %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))
    
    `summarise()` ungrouping output (override with `.groups` argument)
    # A tibble: 3 x 5
         ID mean_ medi_  max_  min_
      <int> <dbl> <dbl> <int> <int>
    1     1  3      3       6     0
    2     2  3.5    3       9     0
    3     3  2.33   2.5     5     0

For the second part:

df %>% 
   rowwise() %>%
   summarise(mean_ = mean(c_across(A:C), na.rm = T),
             medi_ = median(c_across(A:C), na.rm = T),
             max_  = max(c_across(A:C), na.rm = T),
             min_  = min(c_across(A:C), na.rm = T))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
  mean_ medi_  max_  min_
  <dbl> <int> <int> <int>
1  2        1     5     0
2  2        3     3     0
3  1        1     2     0
4  5        5     9     1
5  3.67     3     5     3
6  4        4     6     2

With data:

df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L, 
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L, 
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
))

R find min and max for each group based on other row

With tidyverse you can try the following approach. First, put your data into long form targeting your year columns. Then, group_by both group and name (which contains the year) and only include subgroups that have a value of x, and keep rows that have condition of 1. Then group_by just group and summarise to get the min and max years. Note, you may wish to convert your year data to numeric after removing x by filtering on condition.

library(tidyverse)

df1 %>%
  pivot_longer(cols = -c(group, condition)) %>%
  group_by(group, name) %>%
  filter(any(value == "x"), condition == 1) %>%
  group_by(group) %>%
  summarise(min = min(value),
            max = max(value))

Output

# A tibble: 3 x 3
  group min   max  
  <chr> <chr> <chr>
1 a     2010  2013 
2 b     2011  2015 
3 c     2010  2014

How to use R dplyr's summarize to count the number of rows that match a criteria?

You can use sum on logical vectors - it will automatically convert them into numeric values (TRUE being equal to 1 and FALSE being equal to 0), so you need only do:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(more_than_300))
#> # A tibble: 2 x 3
#>   location total_score n_outliers
#>   <chr>          <dbl>      <int>
#> 1 away             927          2
#> 2 home             552          0

Or, if these are your only 3 columns, an equivalent would be:

test %>%
  group_by(location) %>%
  summarize(across(everything(), sum))

In fact, you don't need to make the more_than_300 column - it would suffice to do:

test %>%
  group_by(location) %>%
  summarize(total_score = sum(score),
            n_outliers  = sum(score > 300))

find min and max values and create columns for these for each unique identifier (grouping element) in R

Firstly, you need to input NA's as NA not "NA", otherwise R reads it as character vector and you can't use the min() function. This code produces the desired output:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA,
         NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841)

df <- data.frame(MC,ASN,Dia)

library(dplyr)

df <- df %>%
  group_by(MC) %>%
  mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T))

And use this if you only want to keep one observation of MC:

df2 <- df %>%
  group_by(MC) %>%
  mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>%
  ungroup() %>%
  distinct(MC, minDia, maxDia)

How to find max and min values regardless of being positive or negative using R

You can use slice_min and slice_max to get the highest or lowest n (default is 1) rows by group. Since you are looking at distance, you should use abs to get the absolute value of the distance.

dat %>% 
  group_by(Genes, intA) %>%
  slice_max(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -           -3467567
#2 GeneB P53   chr19    chr8     -           -              -2884
  
dat %>% 
  group_by(Genes, intA) %>%
  slice_min(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -               -423
#2 GeneB P53   chr19    chr8     -           -                -40

How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values