Extract row corresponding to minimum value of a variable by group
Slightly more elegant:
library(data.table)
DT[ , .SD[which.min(Employees)], by = State]
State Company Employees
1: AK D 24
2: RI E 19
Slighly less elegant than using .SD
, but a bit faster (for data with many groups):
DT[DT[ , .I[which.min(Employees)], by = State]$V1]
Also, just replace the expression which.min(Employees)
with Employees == min(Employees)
, if your data set has multiple identical min values and you'd like to subset all of them.
See also Subset rows corresponding to max value by group using data.table.
Only keep the minimum value of each group
With .SD
:
dataz[,.SD[value==min(value)],by=.(group)]
group value
<char> <num>
1: ZAS 0.39590814
2: Car 0.42591138
3: EEE 0.07049145
4: EEff 0.34670793
5: 2133 0.05702904
6: EETTE 0.31071582
Select rows with min value by group
Using DWin's solution, tapply
can be avoided using ave
.
df[ df$v1 == ave(df$v1, df$f, FUN=min), ]
This gives another speed-up, as shown below. Mind you, this is also dependent on the number of levels. I give this as I notice that ave
is far too often forgotten about, although it is one of the more powerful functions in R.
f <- rep(letters[1:20],10000)
v1 <- rnorm(20*10000)
v2 <- 1:(20*10000)
df <- data.frame(f,v1,v2)
> system.time(df[ df$v1 == ave(df$v1, df$f, FUN=min), ])
user system elapsed
0.05 0.00 0.05
> system.time(df[ df$v1 %in% tapply(df$v1, df$f, min), ])
user system elapsed
0.25 0.03 0.29
> system.time(lapply(split(df, df$f), FUN = function(x) {
+ vec <- which(x[3] == min(x[3]))
+ return(x[vec, ])
+ })
+ .... [TRUNCATED]
user system elapsed
0.56 0.00 0.58
> system.time(df[tapply(1:nrow(df),df$f,function(i) i[which.min(df$v1[i])]),]
+ )
user system elapsed
0.17 0.00 0.19
> system.time( ddply(df, .var = "f", .fun = function(x) {
+ return(subset(x, v1 %in% min(v1)))
+ }
+ )
+ )
user system elapsed
0.28 0.00 0.28
Extract row corresponding to maximum value by group for multiple variables
max
and which.max
are two different functions doing different things. max
would give the max value in a vector whereas which.max
would give position of the max value in the vector.
x <- 4:1
max(x)
#[1] 4
which.max(x)
#[1] 1
Here which.max
returns 1 because 4 is present at the 1st position in the vector x
.
So if you need max
values in multiple columns, you should use max
and not which.max
.
library(data.table)
setDT(dt)
variables = colnames(dt[, 2:10])
dt[, lapply(.SD, max), .SDcols = variables, ID]
# ID a b c d e f g h i
# 1: 1 1 1 1 1 1 1 1 1 1
# 2: 2 1 1 1 0 0 1 1 0 1
# 3: 3 1 1 1 0 1 1 1 1 1
# 4: 4 1 1 1 0 0 1 1 0 0
# 5: 5 1 1 1 1 1 1 1 0 0
# 6: 6 1 1 1 1 1 1 1 0 1
# 7: 7 1 1 1 1 1 0 1 0 0
# 8: 8 1 1 1 1 0 1 1 1 1
# 9: 9 1 1 1 0 1 1 1 0 0
#10: 10 1 1 1 1 1 1 1 1 1
Pandas GroupBy and select rows with the minimum value in a specific column
I feel like you're overthinking this. Just use groupby
and idxmin
:
df.loc[df.groupby('A').B.idxmin()]
A B C
2 1 2 10
4 2 4 4
df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)
A B C
0 1 2 10
1 2 4 4
How to extract the row with min or max values?
You can include your which.max
call as the first argument to your subsetting call:
df[which.max(df$Temp),]
In case of duplicated value in variable, keep row with lowest value based on other variable
We can use slice_min
after grouping by 'ID'
library(dplyr)
df %>%
group_by(ID) %>%
slice_min(tti) %>%
ungroup
-output
# A tibble: 3 x 2
# ID tti
# <int> <dbl>
#1 9 2.7
#2 12 1.2
#3 118 1.4
Or with collapse
library(collapse)
df %>%
fgroup_by(ID) %>%
fsummarise(tti = fmin(tti))
# ID tti
#1 9 2.7
#2 12 1.2
#3 118 1.4
Or another option is roworder
(which is faster than arrange
from dplyr
) with funique
roworder(df, ID, tti) %>%
funique(cols = 1)
# ID tti
#1 9 2.7
#2 12 1.2
#3 118 1.4
Related Topics
Remove Unwanted Symbols from Expression Function - R
Creating a for Loop to Subset Data on R
Count Occurrences of Value in a Set of Variables in R (Per Row)
Convert Multiple Columns of Numeric Data to Dates in R
Duplicate Columns in Spark Dataframe
R Collapse Multiple Rows into 1 Row - Same Columns
How to Loop Through List and Create Separate Dataframes in R
Find Duplicated Elements With Dplyr
How to Give Subtitles for Subplot in Plot_Ly Using R
Calculate Max Value Across Multiple Columns by Multiple Groups
How to Add a Diagonal Line to a Plot
Creating a Boxplot for Each Column in R
Regex to Replace Comma to Dot Separator
Replace Column Values With Na Based on a Different Column or Row Position With Tidyverse