How to use Dplyr's Summarize and which() to lookup min/max values
You can use which.min
and which.max
to get the first value.
data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = Name[which.min(Age)],
maxAge = max(Age), maxAgeName = Name[which.max(Age)])
To get all values, use e.g. paste with an appropriate collapse
argument.
data %>% group_by(Group) %>%
summarize(minAge = min(Age), minAgeName = paste(Name[which(Age == min(Age))], collapse = ", "),
maxAge = max(Age), maxAgeName = paste(Name[which(Age == max(Age))], collapse = ", "))
How to print a min and max values based on a condition - dplyR
Maybe this can help:
library(dplyr)
#Code
new <- df %>% group_by(g) %>%
mutate(x=ifelse(g=='A',min(x,na.rm = T),
ifelse(g=='B',max(x,na.rm = T)))) %>%
summarise(x=unique(x))
Output:
# A tibble: 2 x 2
g x
<chr> <dbl>
1 A 3
2 B 9
Dplyr group_by summarize keep min/max value for each column within group, depending on column suffix
An option would be to group by 'grp', get the max
of columns that end with 'high' (column names), use that also as grouping column and get the min
of columns that end with 'low'
library(dplyr)
dat %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 4
# Groups: grp, v1_high [2]
# grp v1_high v2_high v3_low
# <fct> <dbl> <dbl> <dbl>
#1 A 0.184 0.330 -0.305
#2 B 1.60 0.738 0.390
It would also work if there are no min
columns
dat[-4] %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 3
# Groups: grp, v1_high [2]
# grp v1_high v2_high
# <fct> <dbl> <dbl>
#1 A 0.184 0.330
#2 B 1.60 0.738
Or another option is map2
library(purrr)
map2(list(min, max), list('low', 'high'), ~
dat %>%
select(grp, ends_with(.y)) %>%
group_by(grp) %>%
summarise_all(.x)) %>%
reduce(inner_join, by = 'grp')
trying to group_by and then summarize max and min - running into error for unambiguous format
Embed everything in the same summarise call. Also, you should specify the format of your date in the format
argument of as.Date
when your data is not in the international date format.
dat %>%
mutate(across(ends_with("Date"), as.Date, format = "%m/%d/%Y")) %>%
group_by(id, street) %>%
summarise(firstReportedDate = min(firstReportedDate),
lastReportedDate = max(lastReportedDate))
output
# A tibble: 10 × 4
# Groups: id [4]
id street firstReportedDate lastReportedDate
<chr> <chr> <date> <date>
1 1000 19703 Highway 59 N 2009-01-01 2011-01-01
2 1000 6714 Dorylee Ln 2004-03-05 2004-03-05
3 1000 Po Box 203 2017-09-22 2022-04-01
4 1431 3511 Forest Row Dr 2009-09-30 2022-04-01
5 1431 Acorn Ln 2009-01-01 2013-01-01
6 357 1040 Marina Dr 2015-01-01 2021-01-01
7 357 2200 Lake Village Dr 2017-09-30 2022-04-01
8 359 1060 Marina Dr 2009-09-30 2009-09-30
9 359 22302 Rustic Bridge Ln 2017-06-15 2022-04-01
10 359 3211 Laurel Point Ct 2002-10-12 2004-03-03
How to calculate mean , min, and max across when grouping using dplyr?
You can try something like this:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 5
ID mean_ medi_ max_ min_
<int> <dbl> <dbl> <int> <int>
1 1 3 3 6 0
2 2 3.5 3 9 0
3 3 2.33 2.5 5 0
For the second part:
df %>%
rowwise() %>%
summarise(mean_ = mean(c_across(A:C), na.rm = T),
medi_ = median(c_across(A:C), na.rm = T),
max_ = max(c_across(A:C), na.rm = T),
min_ = min(c_across(A:C), na.rm = T))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 6 x 4
mean_ medi_ max_ min_
<dbl> <int> <int> <int>
1 2 1 5 0
2 2 3 3 0
3 1 1 2 0
4 5 5 9 1
5 3.67 3 5 3
6 4 4 6 2
With data:
df <- structure(list(ID = c(1L, 2L, 3L, 2L, 3L, 1L), A = c(1L, 3L,
0L, 5L, 3L, 2L), B = c(5L, 0L, 2L, 9L, 5L, 6L), C = c(0L, 3L,
1L, 1L, 3L, 4L)), class = "data.frame", row.names = c(NA, -6L
))
R find min and max for each group based on other row
With tidyverse
you can try the following approach. First, put your data into long form targeting your year columns. Then, group_by
both group and name (which contains the year) and only include subgroups that have a value
of x
, and keep rows that have condition
of 1. Then group_by
just group
and summarise
to get the min
and max
years. Note, you may wish to convert your year data to numeric after removing x
by filtering on condition
.
library(tidyverse)
df1 %>%
pivot_longer(cols = -c(group, condition)) %>%
group_by(group, name) %>%
filter(any(value == "x"), condition == 1) %>%
group_by(group) %>%
summarise(min = min(value),
max = max(value))
Output
# A tibble: 3 x 3
group min max
<chr> <chr> <chr>
1 a 2010 2013
2 b 2011 2015
3 c 2010 2014
How to use R dplyr's summarize to count the number of rows that match a criteria?
You can use sum
on logical vectors - it will automatically convert them into numeric values (TRUE
being equal to 1 and FALSE
being equal to 0), so you need only do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(more_than_300))
#> # A tibble: 2 x 3
#> location total_score n_outliers
#> <chr> <dbl> <int>
#> 1 away 927 2
#> 2 home 552 0
Or, if these are your only 3 columns, an equivalent would be:
test %>%
group_by(location) %>%
summarize(across(everything(), sum))
In fact, you don't need to make the more_than_300
column - it would suffice to do:
test %>%
group_by(location) %>%
summarize(total_score = sum(score),
n_outliers = sum(score > 300))
find min and max values and create columns for these for each unique identifier (grouping element) in R
Firstly, you need to input NA's as NA
not "NA"
, otherwise R reads it as character vector and you can't use the min()
function. This code produces the desired output:
MC <- c(rep("OS000348",8), rep("OS000361",13), rep("OS000375",5))
ASN <- c(rep(2,8), rep(3,5), rep(2,8), rep(3,5))
Dia <- c(870,NA, 867.3, NA, NA, 890.3,NA,NA,871.2,NA,868.7,NA,866.2, NA,
NA,851,NA,NA,842,NA,NA,880,860,851.8,NA,841)
df <- data.frame(MC,ASN,Dia)
library(dplyr)
df <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T))
And use this if you only want to keep one observation of MC:
df2 <- df %>%
group_by(MC) %>%
mutate(minDia=min(Dia, na.rm=T), maxDia=max(Dia, na.rm=T)) %>%
ungroup() %>%
distinct(MC, minDia, maxDia)
How to find max and min values regardless of being positive or negative using R
You can use slice_min
and slice_max
to get the highest or lowest n
(default is 1) rows by group. Since you are looking at distance, you should use abs
to get the absolute value of the distance.
dat %>%
group_by(Genes, intA) %>%
slice_max(abs(distance))
# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -3467567
#2 GeneB P53 chr19 chr8 - - -2884
dat %>%
group_by(Genes, intA) %>%
slice_min(abs(distance))
# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -423
#2 GeneB P53 chr19 chr8 - - -40
Related Topics
Leaflet Legend for Custom Markers in R
How to Use Empty Space Produced by Facet_Wrap
Downloading Png from Shiny (R)
Geom_Tile and Facet_Grid/Facet_Wrap for Same Height of Tiles
Merging a Large List of Xts Objects
What Type of Graph Is This? and Can It Be Created Using Ggplot2
R - How to Make Barplot Plot Zeros for Missing Values Over the Data Range
Appending a List to a List of Lists in R
Reset the Graphical Parameters Back to Default Values Without Use of Dev.Off()
Adjust Plot Title (Main) Position
Object Not Found Error When Passing Model Formula to Another Function
Check If a Date Is Within an Interval in R
Can Ggplot2 Control Point Size and Line Size (Lineweight) Separately in One Legend
How to Combine Ggplot and Dplyr into a Function
R: Using a String as an Argument to Mutate Verb in Dplyr
Update Shiny's 'Selectinput' Dropdown with New Values After Uploading New Data Using Fileinput
Getting All Combinations Which Sum Up to 100 Using R
Plotting Multiple Time Series on the Same Plot Using Ggplot()