How to find the highest value of a column in a data frame in R?
Similar to colMeans
, colSums
, etc, you could write a column maximum function, colMax
, and a column sort function, colSort
.
colMax <- function(data) sapply(data, max, na.rm = TRUE)
colSort <- function(data, ...) sapply(data, sort, ...)
I use ...
in the second function in hopes of sparking your intrigue.
Get your data:
dat <- read.table(h=T, text = "Ozone Solar.R Wind Temp Month Day
1 41 190 7.4 67 5 1
2 36 118 8.0 72 5 2
3 12 149 12.6 74 5 3
4 18 313 11.5 62 5 4
5 NA NA 14.3 56 5 5
6 28 NA 14.9 66 5 6
7 23 299 8.6 65 5 7
8 19 99 13.8 59 5 8
9 8 19 20.1 61 5 9")
Use colMax
function on sample data:
colMax(dat)
# Ozone Solar.R Wind Temp Month Day
# 41.0 313.0 20.1 74.0 5.0 9.0
To do the sorting on a single column,
sort(dat$Solar.R, decreasing = TRUE)
# [1] 313 299 190 149 118 99 19
and over all columns use our colSort
function,
colSort(dat, decreasing = TRUE) ## compare with '...' above
How to select the max value of each row (not all columns) and mutate 2 columns which are the max value and name in R?
Method 1
Simply use pmax
and max.col
function to identify the maximum values and columns.
library(dplyr)
df %>% mutate(max = pmax(a,b), type = colnames(df)[max.col(df[,3:4]) + 2 ])
Method 2
Or first re-shape your data to a "long" format for easier manipulation. Then use mutate
to extract max
values and names. Finally change it back to a "wide" format and relocate
columns according to your target.
df %>%
pivot_longer(a:b, names_to = "colname") %>%
group_by(lon, lat) %>%
mutate(max = max(value),
type = colname[which.max(value)]) %>%
pivot_wider(everything(), names_from = "colname", values_from = "value") %>%
relocate(max, type, .after = b)
Output
# A tibble: 4 × 6
# Groups: lon, lat [4]
lon lat a b max type
<dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 102 31 4 5 5 b
2 103 32 3 2 3 a
3 104 33 7 4 7 a
4 105 34 6 9 9 b
Select the row with the maximum value in each group
Here's a data.table
solution:
require(data.table) ## 1.9.2
group <- as.data.table(group)
If you want to keep all the entries corresponding to max values of pt
within each group:
group[group[, .I[pt == max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
If you'd like just the first max value of pt
:
group[group[, .I[which.max(pt)], by=Subject]$V1]
# Subject pt Event
# 1: 1 5 2
# 2: 2 17 2
# 3: 3 5 2
In this case, it doesn't make a difference, as there aren't multiple maximum values within any group in your data.
R : select only the maximum value from each columns based on row in R
you can use apply(MY_DATA, 2, max)
for example with mtcars
a default dataset build into R
> apply(mtcars,2, max)
mpg cyl disp hp drat wt qsec vs am gear carb
33.900 8.000 472.000 335.000 4.930 5.424 22.900 1.000 1.000 5.000 8.000
The apply
function applies a function row-by-row (1
) or column-by-column (2
). Here I'm applying a function that returns the maximum on a column-by-column basis.
how to keep only rows that have highest value in certain column in R
An easier approach is with max.col
in base R
. Select the columns that are numeric. Get the column index of each row where the value is max
. Check if that is equal to 1 i.e. the first column (as we selected only from 2nd column onwards) and subset
the rows
subset(df, max.col(df[-1], 'first') == 1)
# A tibble: 2 x 5
# Species North South East West
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 a 4 3 2 3
#2 D 3 2 2 2
If it is based on the rowwise mean
subset(df, North > rowMeans(df[-1]))
Or if we prefer to use dplyr
library(dplyr)
df %>%
filter(max.col(cur_data()[-1], 'first') == 1)
Similarly if it based on the rowwise mean
df %>%
filter(North > rowMeans(cur_data()[-1]))
Select the row with the maximum value in each group based on multiple columns in R dplyr
We may get rowwise max of the 'count' columns with pmax
, grouped by 'col1', filter
the rows where the max
value of 'Max' column is.
library(dplyr)
df1 %>%
mutate(Max = pmax(count_col1, count_col2) ) %>%
group_by(col1) %>%
filter(Max == max(Max)) %>%
ungroup %>%
select(-Max)
-output
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
We may also use slice_max
library(purrr)
df1 %>%
group_by(col1) %>%
slice_max(invoke(pmax, across(starts_with("count")))) %>%
ungroup
# A tibble: 3 × 4
col1 col2 count_col1 count_col2
<chr> <chr> <dbl> <dbl>
1 apple aple 1 4
2 banana banan 4 1
3 banana bananb 4 1
Selecting Max Column Values in R
You can use ave
with a custom function that wraps max
, so you can remove NA
values:
Data$Y <- ave(Data$Y, Data$X, FUN=function(x) max(x, na.rm=TRUE))
Related Topics
R Programming: Read.Csv() Skips Lines Unexpectedly
How to Merge Two Data Frame Based on Partial String Match with R
Logistic Regression: How to Try Every Combination of Predictors in R
Reshape R Data with User Entries in Rows, Collapsing for Each User
Click on Cross Domain Iframe Element Using Rselenium
Populate Nas in a Vector Using Prior Non-Na Values
Shiny Ui.R - Error in Tag("Div", List(...)) - Not Sure Where Error Is
Error in Install.Packages:Type =="Both" Cannot Be Used with 'Repos =Null'
Getting the Minimum of the Rows in a Data Frame
Place Text Values to Right of Sankey Diagram
Ggplot: How to Produce a Gradient Fill Within a Geom_Polygon
How to Highlight Area Between Two Lines? Ggplot
Test If Element Is in a List and Return 0 or 1
Regex to Remove All Non-Digit Symbols from String in R
How to Display Line Numbers for Code Chunks in Rmarkdown HTML and PDF