max.col with NA removal
We replace
the 'NA' with -Inf
in 'a' and apply the max.col
on that.
v1 <- max.col(replace(a, is.na(a), -Inf), ties.method="first")
But, this will return 1 for the last row which have all NAs. To return NA, we can multiply it with the NA converted negated (!
) rowSums
of logical matrix (!is.na(a)
).
v1 * NA^!rowSums(!is.na(a))
#[1] 2 2 3 1 NA
EDIT: Changed the replace
ment from 0 to -Inf based on @Frank's comment
As the OP was using apply
, which.max
can return the column index
apply(a, 1, function(x) which.max(x)[1])
#[1] 2 2 3 1 NA
Or
sapply(apply(a, 1, which.max), `length<-`, 1)
#[1] 2 2 3 1 NA
How to keep only max value of row and convert other value to NA?
We can use apply
to loop over the rows (MARGIN = 1
) and replace
the values that are not equal to max
with NA
, assign the transpose back to the original object
df[] <- t(apply(df, 1, function(x) replace(x, x != max(x, na.rm = TRUE), NA)))
Or with rowMaxs
library(matrixStats)
i1 <- !!rowSums(!is.na(df))
df[i1,] <- replace(df[i1,], df[i1,] != rowMaxs(as.matrix(df[i1,]),
na.rm = TRUE)[col(df[i1,])], NA)
Or using dplyr
library(dplyr)
library(purrr)
df %>%
mutate(new = reduce(., pmax, na.rm = TRUE)) %>%
transmute_at(vars(starts_with('col')), ~ replace(., .!= new, NA))
Min and Max across multiple columns with NAs
You can use hablar
's min_
and max_
function which returns NA
if all values are NA
.
library(dplyr)
library(hablar)
dat %>%
rowwise() %>%
mutate(min = min_(c_across(-ID)),
max = max_(c_across(-ID)))
You can also use this with apply
-
cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))
# ID PM TP2 Sigma min max
#1 1 1 2 3 1 3
#2 2 0 NA 1 0 1
#3 3 2 1 NA 1 2
#4 4 1 0 2 0 2
#5 NA NA NA NA NA NA
#6 5 2 0 7 0 7
Combine column to remove NA's yet prioritize specific replacements
Use max.col
and some matrix indexing (specifying which row/col combination to take):
cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))
# [,1] [,2]
#[1,] 1 3
#[2,] 2 2
#[3,] 3 3
#[4,] 4 1
#[5,] 5 3
#[6,] 6 3
data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))]
#[1] 99 2 4 3 4 5
cbind(data[1], result=data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))])
# a result
#1 A 99
#2 B 2
#3 C 4
#4 D 3
#5 E 4
#6 F 5
If you need a particular column to always be given precedence, make a temporary object with the columns in a particular order, and then process it:
tmp <- data[-1][c("z", setdiff(names(data[-1]), "z"))]
tmp[cbind(1:nrow(tmp), max.col(!is.na(tmp), "first"))]
#[1] 99 2 4 3 4 5
Finding the max of a R dataframe column ignoring -Inf and NA
One solution would be the following:
data <- data.frame(column1 = c(-Inf, 4, NA, 7, 10), column2 = c(2, 8, 5, 4, 4))
column1b <- data$column1[which(!is.na(data$column1))]
column1c <- column1b[which(column1b < Inf)]
max(column1c)
Remove NA values from a vector
Trying ?max
, you'll see that it actually has a na.rm =
argument, set by default to FALSE
. (That's the common default for many other R functions, including sum()
, mean()
, etc.)
Setting na.rm=TRUE
does just what you're asking for:
d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)
If you do want to remove all of the NA
s, use this idiom instead:
d <- d[!is.na(d)]
A final note: Other functions (e.g. table()
, lm()
, and sort()
) have NA
-related arguments that use different names (and offer different options). So if NA
's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.
Remove rows containing NA from the column with the least number of NAs
We could first find the name of the column with minimum number of NA
s and then remove NA
rows from that column.
col <- names(which.min(colSums(is.na(df[-1]))))
df[!is.na(df[col]), ]
# Date grpA grpB
#3 2007-11-09 1.66 NA
#4 2007-11-12 1.64 NA
#5 2007-11-13 1.61 1.28
#6 2007-11-14 1.60 1.30
#7 2007-11-15 1.57 1.27
#8 2007-11-16 1.56 1.25
#9 2007-11-19 1.55 1.25
#10 2007-11-20 1.55 1.25
#11 2007-11-21 1.52 1.22
#12 2007-11-22 1.50 1.21
#13 2007-11-23 1.51 1.21
#14 2007-11-26 1.52 1.25
#15 2007-11-27 1.50 1.25
#16 2007-11-28 1.50 1.23
#17 2007-11-29 1.52 1.24
#18 2007-11-30 1.56 1.25
#19 2007-12-03 1.56 1.22
#20 2007-12-04 1.56 1.23
which can be done in one-liner as well without creating additional variable
df[!is.na(df[names(which.min(colSums(is.na(df[-1]))))]), ]
Using the same logic a dplyr
approach could be using filter_at
library(dplyr)
df %>%
filter_at(df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names, ~!is.na(.))
Or using it with tidyr::drop_na
tidyr::drop_na(df, df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names)
How do I remove a row containing NA if NAs are allowed before a person enters a sample?
An option with apply
using MARGIN = 1
row-wise
#Select columns based on pattern in the weight column
cols <- grep("^W", names(df))
#Select rows only if there is no NA after the first non-NA is encountered.
df[!apply(df[cols], 1, function(x) any(which(is.na(x)) > which.max(!is.na(x)))), ]
# data W_Y1 W_Y2 W_Y3 W_Y4 W_Y5 W_Y6 W_Y7 W_8 W_9
#2 Ind_2 NA NA NA 82 81 83 84 65 86
Using similar logic but with mapply
and max.col
df[mapply(function(x, y) !any(which(is.na(df[x, cols])) > y),1:nrow(df),
max.col(!is.na(df[cols]), ties.method = "first")), ]
Using max.col
we find the index of first non-NA value in the cols
and then check if there is any value in that row which has NA
after that index.
data
I added some rows to make a better example
df <- structure(list(data = structure(1:4, .Label = c("Ind_1", "Ind_2",
"Ind_3", "Ind_4"), class = "factor"), W_Y1 = c(NA, NA, NA, NA
), W_Y2 = c(NA, NA, NA, 23L), W_Y3 = c(NA, NA, NA, NA), W_Y4 = c(82L,
82L, 82L, 82L), W_Y5 = c(81L, 81L, 81L, 81L), W_Y6 = c(83L, 83L,
83L, 83L), W_Y7 = c(84L, 84L, NA, 84L), W_8 = c(NA, 65L, NA,
12L), W_9 = c(86L, 86L, 86L, 86L)), class = "data.frame", row.names = c(NA,
-4L))
df
# data W_Y1 W_Y2 W_Y3 W_Y4 W_Y5 W_Y6 W_Y7 W_8 W_9
#1 Ind_1 NA NA NA 82 81 83 84 NA 86
#2 Ind_2 NA NA NA 82 81 83 84 65 86
#3 Ind_3 NA NA NA 82 81 83 NA NA 86
#4 Ind_4 NA 23 NA 82 81 83 84 12 86
Related Topics
Geom_Smooth with Facet_Grid and Different Fitting Functions
Error in Xj[I]: Invalid Subscript Type 'List'
Summing Multiple Columns in an R Data-Frame Quickly
Removing "Nul" Characters (Within R)
Adding a Layer to The Current Plot Without Creating a New One in Ggplot2
Data.Table Objects Aren't Updated in Rstudio Environment Panel
R Not Responding Request to Interrupt Stop Process
Devtools::Install_Git Over Ssh
Create New Variable by Multiple Conditions via Mutate Case_When
Data Table String Concatenation of Sd Columns for by Group Values
The Fastest Way to Convert Numeric to Character in R
Obtain Date Column from Xts Object
Is There an Equivalent in Ggplot to The Varwidth Option in Plot
How to Split a Dataframe Column by The First Instance of a Character in Its Values