Selecting Max Column Values in R

How to find the highest value of a column in a data frame in R?

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

Get your data:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1     41     190  7.4   67     5   1
2     36     118  8.0   72     5   2
3     12     149 12.6   74     5   3
4     18     313 11.5   62     5   4
5     NA      NA 14.3   56     5   5
6     28      NA 14.9   66     5   6
7     23     299  8.6   65     5   7
8     19      99 13.8   59     5   8
9      8      19 20.1   61     5   9")

Use colMax function on sample data:

colMax(dat)
#  Ozone Solar.R    Wind    Temp   Month     Day 
#   41.0   313.0    20.1    74.0     5.0     9.0

To do the sorting on a single column,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118  99  19

and over all columns use our colSort function,

colSort(dat, decreasing = TRUE) ## compare with '...' above

How to select the max value of each row (not all columns) and mutate 2 columns which are the max value and name in R？

Method 1

Simply use pmax and max.col function to identify the maximum values and columns.

library(dplyr)

df %>% mutate(max = pmax(a,b), type = colnames(df)[max.col(df[,3:4]) + 2 ])

Method 2

Or first re-shape your data to a "long" format for easier manipulation. Then use mutate to extract max values and names. Finally change it back to a "wide" format and relocate columns according to your target.

df %>% 
  pivot_longer(a:b, names_to = "colname") %>% 
  group_by(lon, lat) %>% 
  mutate(max = max(value), 
         type = colname[which.max(value)]) %>% 
  pivot_wider(everything(), names_from = "colname", values_from = "value") %>% 
  relocate(max, type, .after = b)

Output

# A tibble: 4 × 6
# Groups:   lon, lat [4]
    lon   lat     a     b   max type 
  <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1   102    31     4     5     5 b    
2   103    32     3     2     3 a    
3   104    33     7     4     7 a    
4   105    34     6     9     9 b

Select the row with the maximum value in each group

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
#    Subject pt Event
# 1:       1  5     2
# 2:       2 17     2
# 3:       3  5     2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

R : select only the maximum value from each columns based on row in R

you can use apply(MY_DATA, 2, max)

for example with mtcars a default dataset build into R

> apply(mtcars,2, max)
    mpg     cyl    disp      hp    drat      wt    qsec      vs      am    gear    carb 
 33.900   8.000 472.000 335.000   4.930   5.424  22.900   1.000   1.000   5.000   8.000

The apply function applies a function row-by-row (1) or column-by-column (2). Here I'm applying a function that returns the maximum on a column-by-column basis.

how to keep only rows that have highest value in certain column in R

An easier approach is with max.col in base R. Select the columns that are numeric. Get the column index of each row where the value is max. Check if that is equal to 1 i.e. the first column (as we selected only from 2nd column onwards) and subset the rows

subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
#  Species North South  East  West
#  <chr>   <dbl> <dbl> <dbl> <dbl>
#1 a           4     3     2     3
#2 D           3     2     2     2

If it is based on the rowwise mean

subset(df, North > rowMeans(df[-1]))

Or if we prefer to use dplyr

library(dplyr)
df %>%
   filter(max.col(cur_data()[-1], 'first') == 1)

Similarly if it based on the rowwise mean

df %>% 
    filter(North > rowMeans(cur_data()[-1]))

Select the row with the maximum value in each group based on multiple columns in R dplyr

We may get rowwise max of the 'count' columns with pmax, grouped by 'col1', filter the rows where the max value of 'Max' column is.

library(dplyr)
df1 %>% 
 mutate(Max = pmax(count_col1, count_col2) ) %>%
 group_by(col1) %>%
 filter(Max == max(Max)) %>%
 ungroup %>%
 select(-Max)

-output

# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1

We may also use slice_max

library(purrr)
df1 %>%
  group_by(col1) %>%
  slice_max(invoke(pmax, across(starts_with("count")))) %>%
  ungroup
# A tibble: 3 × 4
  col1   col2   count_col1 count_col2
  <chr>  <chr>       <dbl>      <dbl>
1 apple  aple            1          4
2 banana banan           4          1
3 banana bananb          4          1