How to Use Dplyr's Summarize and Which() to Lookup Min/Max Values

How to use Dplyr's Summarize and which() to lookup min/max values

You can use which.min and which.max to get the first value.

data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = Name[which.min(Age)],
maxAge = max(Age), maxAgeName = Name[which.max(Age)])

To get all values, use e.g. paste with an appropriate collapse argument.

data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "),
maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))

How to print a min and max values based on a condition - dplyR

Maybe this can help:

library(dplyr)
#Code
new <- df %>% group_by(g) %>%
mutate(x=ifelse(g=='A',min(x,na.rm = T),
ifelse(g=='B',max(x,na.rm = T)))) %>%
summarise(x=unique(x))

Output:

# A tibble: 2 x 2
g x
<chr> <dbl>
1 A 3
2 B 9

Dplyr group_by summarize keep min/max value for each column within group, depending on column suffix

An option would be to group by 'grp', get the max of columns that end with 'high' (column names), use that also as grouping column and get the min of columns that end with 'low'

library(dplyr)    
dat %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 4
# Groups: grp, v1_high [2]
# grp v1_high v2_high v3_low
# <fct> <dbl> <dbl> <dbl>
#1 A 0.184 0.330 -0.305
#2 B 1.60 0.738 0.390

It would also work if there are no min columns

dat[-4] %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 3
# Groups: grp, v1_high [2]
# grp v1_high v2_high
# <fct> <dbl> <dbl>
#1 A 0.184 0.330
#2 B 1.60 0.738

Or another option is map2

library(purrr)
map2(list(min, max), list('low', 'high'), ~
dat %>%
select(grp, ends_with(.y)) %>%
group_by(grp) %>%
summarise_all(.x)) %>%
reduce(inner_join, by = 'grp')

trying to group_by and then summarize max and min - running into error for unambiguous format

Embed everything in the same summarise call. Also, you should specify the format of your date in the format argument of as.Date when your data is not in the international date format.

dat %>% 
mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>%
group_by(id, street) %>%
summarise(firstReportedDate = min(firstReportedDate),
lastReportedDate = max(lastReportedDate))

output

# A tibble: 10 × 4
# Groups: id [4]
id street firstReportedDate lastReportedDate
<chr> <chr> <date> <date>
1 1000 19703 Highway 59 N 2009-01-01 2011-01-01
2 1000 6714 Dorylee Ln 2004-03-05 2004-03-05
3 1000 Po Box 203 2017-09-22 2022-04-01
4 1431 3511 Forest Row Dr 2009-09-30 2022-04-01
5 1431 Acorn Ln 2009-01-01 2013-01-01
6 357 1040 Marina Dr 2015-01-01 2021-01-01
7 357 2200 Lake Village Dr 2017-09-30 2022-04-01
8 359 1060 Marina Dr 2009-09-30 2009-09-30
9 359 22302 Rustic Bridge Ln 2017-06-15 2022-04-01
10 359 3211 Laurel Point Ct 2002-10-12 2004-03-03

How to calculate mean , min, and max across when grouping using dplyr?

You can try something like this:

   library(dplyr)
df %>%
group_by(ID) %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 5
ID mean_ medi_ max_ min_
<int> <dbl> <dbl> <int> <int>
1 1 3 3 6 0
2 2 3.5 3 9 0
3 3 2.33 2.5 5 0

For the second part:

df %>% 
rowwise() %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
mean_ medi_ max_ min_
<dbl> <int> <int> <int>
1 2 1 5 0
2 2 3 3 0
3 1 1 2 0
4 5 5 9 1
5 3.67 3 5 3
6 4 4 6 2

With data:

df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L, 
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L,
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
))

R find min and max for each group based on other row

With tidyverse you can try the following approach. First, put your data into long form targeting your year columns. Then, group_by both group and name (which contains the year) and only include subgroups that have a value of x, and keep rows that have condition of 1. Then group_by just group and summarise to get the min and max years. Note, you may wish to convert your year data to numeric after removing x by filtering on condition.

library(tidyverse)

df1 %>%
pivot_longer(cols = -c(group, condition)) %>%
group_by(group, name) %>%
filter(any(value == "x"), condition == 1) %>%
group_by(group) %>%
summarise(min = min(value),
max = max(value))

Output

# A tibble: 3 x 3
group min max
<chr> <chr> <chr>
1 a 2010 2013
2 b 2011 2015
3 c 2010 2014

How to use R dplyr's summarize to count the number of rows that match a criteria?

You can use sum on logical vectors - it will automatically convert them into numeric values (TRUE being equal to 1 and FALSE being equal to 0), so you need only do:

test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(more_than_300))
#> # A tibble: 2 x 3
#> location total_score n_outliers
#> <chr> <dbl> <int>
#> 1 away 927 2
#> 2 home 552 0

Or, if these are your only 3 columns, an equivalent would be:

test %>%
group_by(location) %>%
summarize(across(everything(), sum))

In fact, you don't need to make the more_than_300 column - it would suffice to do:

test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(score > 300))

find min and max values and create columns for these for each unique identifier (grouping element) in R

Firstly, you need to input NA's as NA not "NA", otherwise R reads it as character vector and you can't use the min() function. This code produces the desired output:

MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA,
NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841)

df <- data.frame(MC,ASN,Dia)

library(dplyr)

df <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T))

And use this if you only want to keep one observation of MC:

df2 <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>%
ungroup() %>%
distinct(MC, minDia, maxDia)

How to find max and min values regardless of being positive or negative using R

You can use slice_min and slice_max to get the highest or lowest n (default is 1) rows by group. Since you are looking at distance, you should use abs to get the absolute value of the distance.

dat %>% 
group_by(Genes, intA) %>%
slice_max(abs(distance))

# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -3467567
#2 GeneB P53 chr19 chr8 - - -2884

dat %>%
group_by(Genes, intA) %>%
slice_min(abs(distance))

# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -423
#2 GeneB P53 chr19 chr8 - - -40


Related Topics



Leave a reply



Submit