Selecting Max Column Values in R

How to find the highest value of a column in a data frame in R?

Similar to colMeans, colSums, etc, you could write a column maximum function, colMax, and a column sort function, colSort.

colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)

I use ... in the second function in hopes of sparking your intrigue.

Get your data:

dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9")

Use colMax function on sample data:

colMax(dat)
# Ozone Solar.R Wind Temp Month Day
# 41.0 313.0 20.1 74.0 5.0 9.0

To do the sorting on a single column,

sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118 99 19

and over all columns use our colSort function,

colSort(dat, decreasing = TRUE) ## compare with '...' above

How to select the max value of each row (not all columns) and mutate 2 columns which are the max value and name in R?

Method 1

Simply use pmax and max.col function to identify the maximum values and columns.

library(dplyr)

df %>% mutate(max = pmax(a,b), type = colnames(df)[max.col(df[,3:4]) + 2 ])

Method 2

Or first re-shape your data to a "long" format for easier manipulation. Then use mutate to extract max values and names. Finally change it back to a "wide" format and relocate columns according to your target.

df %>% 
pivot_longer(a:b, names_to = "colname") %>%
group_by(lon, lat) %>%
mutate(max = max(value),
type = colname[which.max(value)]) %>%
pivot_wider(everything(), names_from = "colname", values_from = "value") %>%
relocate(max, type, .after = b)

Output

# A tibble: 4 × 6
# Groups: lon, lat [4]
lon lat a b max type
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 102 31 4 5 5 b
2 103 32 3 2 3 a
3 104 33 7 4 7 a
4 105 34 6 9 9 b

Select the row with the maximum value in each group

Here's a data.table solution:

require(data.table) ## 1.9.2
group <- as.data.table(group)

If you want to keep all the entries corresponding to max values of pt within each group:

group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

If you'd like just the first max value of pt:

group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2

In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.

R : select only the maximum value from each columns based on row in R

you can use apply(MY_DATA, 2, max)

for example with mtcars a default dataset build into R

> apply(mtcars,2, max)
mpg cyl disp hp drat wt qsec vs am gear carb
33.900 8.000 472.000 335.000 4.930 5.424 22.900 1.000 1.000 5.000 8.000

The apply function applies a function row-by-row (1) or column-by-column (2). Here I'm applying a function that returns the maximum on a column-by-column basis.

how to keep only rows that have highest value in certain column in R

An easier approach is with max.col in base R. Select the columns that are numeric. Get the column index of each row where the value is max. Check if that is equal to 1 i.e. the first column (as we selected only from 2nd column onwards) and subset the rows

subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
# Species North South East West
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4 3 2 3
#2 D 3 2 2 2

If it is based on the rowwise mean

subset(df, North > rowMeans(df[-1]))

Or if we prefer to use dplyr

library(dplyr)
df %>%
filter(max.col(cur_data()[-1], 'first') == 1)

Similarly if it based on the rowwise mean

df %>% 
filter(North > rowMeans(cur_data()[-1]))

Select the row with the maximum value in each group based on multiple columns in R dplyr

We may get rowwise max of the 'count' columns with pmax, grouped by 'col1', filter the rows where the max value of 'Max' column is.

library(dplyr)
df1 %>%
mutate(Max = pmax(count_col1, count_col2) ) %>%
group_by(col1) %>%
filter(Max == max(Max)) %>%
ungroup %>%
select(-Max)

-output

# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1

We may also use slice_max

library(purrr)
df1 %>%
group_by(col1) %>%
slice_max(invoke(pmax, across(starts_with("count")))) %>%
ungroup
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1

Selecting Max Column Values in R

You can use ave with a custom function that wraps max, so you can remove NA values:

Data$Y <- ave(Data$Y, Data$X, FUN=function(x) max(x, na.rm=TRUE))


Related Topics



Leave a reply



Submit