﻿ Extract Row Corresponding to Minimum Value of a Variable by Group - ITCodar

Extract Row Corresponding to Minimum Value of a Variable by Group

Extract row corresponding to minimum value of a variable by group

Slightly more elegant:

``library(data.table)DT[ , .SD[which.min(Employees)], by = State]   State Company Employees1:    AK       D        242:    RI       E        19``

Slighly less elegant than using `.SD`, but a bit faster (for data with many groups):

``DT[DT[ , .I[which.min(Employees)], by = State]\$V1]``

Also, just replace the expression `which.min(Employees)` with `Employees == min(Employees)`, if your data set has multiple identical min values and you'd like to subset all of them.

See also Subset rows corresponding to max value by group using data.table.

Only keep the minimum value of each group

With `.SD`:

``dataz[,.SD[value==min(value)],by=.(group)]    group      value   <char>      <num>1:    ZAS 0.395908142:    Car 0.425911383:    EEE 0.070491454:   EEff 0.346707935:   2133 0.057029046:  EETTE 0.31071582``

Select rows with min value by group

Using DWin's solution, `tapply` can be avoided using `ave`.

``df[ df\$v1 == ave(df\$v1, df\$f, FUN=min), ]``

This gives another speed-up, as shown below. Mind you, this is also dependent on the number of levels. I give this as I notice that `ave` is far too often forgotten about, although it is one of the more powerful functions in R.

``f <- rep(letters[1:20],10000)v1 <- rnorm(20*10000)v2 <- 1:(20*10000)df <- data.frame(f,v1,v2)> system.time(df[ df\$v1 == ave(df\$v1, df\$f, FUN=min), ])   user  system elapsed    0.05    0.00    0.05 > system.time(df[ df\$v1 %in% tapply(df\$v1, df\$f, min), ])   user  system elapsed    0.25    0.03    0.29 > system.time(lapply(split(df, df\$f), FUN = function(x) {+             vec <- which(x[3] == min(x[3]))+             return(x[vec, ])+         })+  .... [TRUNCATED]    user  system elapsed    0.56    0.00    0.58 > system.time(df[tapply(1:nrow(df),df\$f,function(i) i[which.min(df\$v1[i])]),]+ )   user  system elapsed    0.17    0.00    0.19 > system.time( ddply(df, .var = "f", .fun = function(x) {+     return(subset(x, v1 %in% min(v1)))+     }+ )+ )   user  system elapsed    0.28    0.00    0.28 ``

Extract row corresponding to maximum value by group for multiple variables

`max` and `which.max` are two different functions doing different things. `max` would give the max value in a vector whereas `which.max` would give position of the max value in the vector.

``x <- 4:1max(x)#[1] 4which.max(x)#[1] 1``

Here `which.max` returns 1 because 4 is present at the 1st position in the vector `x`.

So if you need `max` values in multiple columns, you should use `max` and not `which.max`.

``library(data.table)setDT(dt)variables = colnames(dt[, 2:10])dt[, lapply(.SD, max), .SDcols = variables, ID]#    ID a b c d e f g h i# 1:  1 1 1 1 1 1 1 1 1 1# 2:  2 1 1 1 0 0 1 1 0 1# 3:  3 1 1 1 0 1 1 1 1 1# 4:  4 1 1 1 0 0 1 1 0 0# 5:  5 1 1 1 1 1 1 1 0 0# 6:  6 1 1 1 1 1 1 1 0 1# 7:  7 1 1 1 1 1 0 1 0 0# 8:  8 1 1 1 1 0 1 1 1 1# 9:  9 1 1 1 0 1 1 1 0 0#10: 10 1 1 1 1 1 1 1 1 1``

Pandas GroupBy and select rows with the minimum value in a specific column

I feel like you're overthinking this. Just use `groupby` and `idxmin`:

``df.loc[df.groupby('A').B.idxmin()]   A  B   C2  1  2  104  2  4   4``

``df.loc[df.groupby('A').B.idxmin()].reset_index(drop=True)   A  B   C0  1  2  101  2  4   4``

How to extract the row with min or max values?

You can include your `which.max` call as the first argument to your subsetting call:

``df[which.max(df\$Temp),]``

In case of duplicated value in variable, keep row with lowest value based on other variable

We can use `slice_min` after grouping by 'ID'

``library(dplyr)df %>%    group_by(ID) %>%    slice_min(tti) %>%    ungroup``

-output

``# A tibble: 3 x 2#     ID   tti#  <int> <dbl>#1     9   2.7#2    12   1.2#3   118   1.4``

Or with `collapse`

``library(collapse)df %>%    fgroup_by(ID) %>%    fsummarise(tti = fmin(tti))#   ID tti#1   9 2.7#2  12 1.2#3 118 1.4``

Or another option is `roworder` (which is faster than `arrange` from `dplyr`) with `funique`

``roworder(df, ID, tti) %>%     funique(cols = 1)#    ID tti#1   9 2.7#2  12 1.2#3 118 1.4``