Max.Col with Na Removal

max.col with NA removal

We replace the 'NA' with -Inf in 'a' and apply the max.col on that.

v1 <- max.col(replace(a, is.na(a), -Inf), ties.method="first")

But, this will return 1 for the last row which have all NAs. To return NA, we can multiply it with the NA converted negated (!) rowSums of logical matrix (!is.na(a)).

v1 * NA^!rowSums(!is.na(a))
#[1]  2  2  3  1 NA

EDIT: Changed the replacement from 0 to -Inf based on @Frank's comment

As the OP was using apply, which.max can return the column index

apply(a, 1, function(x) which.max(x)[1])
#[1]  2  2  3  1 NA

sapply(apply(a, 1, which.max), `length<-`, 1)
#[1]  2  2  3  1 NA

How to keep only max value of row and convert other value to NA?

We can use apply to loop over the rows (MARGIN = 1) and replace the values that are not equal to max with NA, assign the transpose back to the original object

df[] <- t(apply(df, 1, function(x) replace(x, x != max(x, na.rm = TRUE), NA)))

Or with rowMaxs

library(matrixStats)
i1 <- !!rowSums(!is.na(df))
df[i1,] <-  replace(df[i1,], df[i1,] != rowMaxs(as.matrix(df[i1,]), 
                na.rm = TRUE)[col(df[i1,])], NA)

Or using dplyr

library(dplyr)
library(purrr)
df %>% 
  mutate(new = reduce(., pmax, na.rm = TRUE)) %>% 
  transmute_at(vars(starts_with('col')), ~ replace(., .!= new, NA))

Min and Max across multiple columns with NAs

You can use hablar's min_ and max_ function which returns NA if all values are NA.

library(dplyr)
library(hablar)

dat %>%
  rowwise() %>%
  mutate(min = min_(c_across(-ID)), 
         max = max_(c_across(-ID)))

You can also use this with apply -

cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))

#  ID PM TP2 Sigma min max
#1  1  1   2     3   1   3
#2  2  0  NA     1   0   1
#3  3  2   1    NA   1   2
#4  4  1   0     2   0   2
#5 NA NA  NA    NA  NA  NA
#6  5  2   0     7   0   7

Combine column to remove NA's yet prioritize specific replacements

Use max.col and some matrix indexing (specifying which row/col combination to take):

cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))
#     [,1] [,2]
#[1,]    1    3
#[2,]    2    2
#[3,]    3    3
#[4,]    4    1
#[5,]    5    3
#[6,]    6    3

data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))]
#[1] 99  2  4  3  4  5

cbind(data[1], result=data[-1][cbind(1:nrow(data), max.col(!is.na(data[-1]), "last"))])
#  a result
#1 A     99
#2 B      2
#3 C      4
#4 D      3
#5 E      4
#6 F      5

If you need a particular column to always be given precedence, make a temporary object with the columns in a particular order, and then process it:

tmp <- data[-1][c("z", setdiff(names(data[-1]), "z"))]
tmp[cbind(1:nrow(tmp), max.col(!is.na(tmp), "first"))]
#[1] 99  2  4  3  4  5

Finding the max of a R dataframe column ignoring -Inf and NA

One solution would be the following:

data <- data.frame(column1 = c(-Inf, 4, NA, 7, 10), column2 = c(2, 8, 5, 4, 4))
column1b <- data$column1[which(!is.na(data$column1))]
column1c <- column1b[which(column1b < Inf)]
max(column1c)

Remove NA values from a vector

Trying ?max, you'll see that it actually has a na.rm = argument, set by default to FALSE. (That's the common default for many other R functions, including sum(), mean(), etc.)

Setting na.rm=TRUE does just what you're asking for:

d <- c(1, 100, NA, 10)
max(d, na.rm=TRUE)

If you do want to remove all of the NAs, use this idiom instead:

d <- d[!is.na(d)]

A final note: Other functions (e.g. table(), lm(), and sort()) have NA-related arguments that use different names (and offer different options). So if NA's cause you problems in a function call, it's worth checking for a built-in solution among the function's arguments. I've found there's usually one already there.

Remove rows containing NA from the column with the least number of NAs

We could first find the name of the column with minimum number of NAs and then remove NA rows from that column.

col <- names(which.min(colSums(is.na(df[-1]))))
df[!is.na(df[col]), ]

#         Date grpA grpB
#3  2007-11-09 1.66   NA
#4  2007-11-12 1.64   NA
#5  2007-11-13 1.61 1.28
#6  2007-11-14 1.60 1.30
#7  2007-11-15 1.57 1.27
#8  2007-11-16 1.56 1.25
#9  2007-11-19 1.55 1.25
#10 2007-11-20 1.55 1.25
#11 2007-11-21 1.52 1.22
#12 2007-11-22 1.50 1.21
#13 2007-11-23 1.51 1.21
#14 2007-11-26 1.52 1.25
#15 2007-11-27 1.50 1.25
#16 2007-11-28 1.50 1.23
#17 2007-11-29 1.52 1.24
#18 2007-11-30 1.56 1.25
#19 2007-12-03 1.56 1.22
#20 2007-12-04 1.56 1.23

which can be done in one-liner as well without creating additional variable

df[!is.na(df[names(which.min(colSums(is.na(df[-1]))))]), ]

Using the same logic a dplyr approach could be using filter_at

library(dplyr)

df %>%
   filter_at(df %>%
   summarise_at(-1, ~sum(is.na(.))) %>%
   which.min %>% names, ~!is.na(.))

Or using it with tidyr::drop_na

tidyr::drop_na(df, df %>%
                  summarise_at(-1, ~sum(is.na(.))) %>%
                  which.min %>% names)

How do I remove a row containing NA if NAs are allowed before a person enters a sample?

An option with apply using MARGIN = 1 row-wise

#Select columns based on pattern in the weight column
cols <- grep("^W", names(df))

#Select rows only if there is no NA after the first non-NA is encountered.
df[!apply(df[cols], 1, function(x) any(which(is.na(x)) > which.max(!is.na(x)))), ]

#   data W_Y1 W_Y2 W_Y3 W_Y4 W_Y5 W_Y6 W_Y7 W_8 W_9
#2 Ind_2   NA   NA   NA   82   81   83   84  65  86

Using similar logic but with mapply and max.col

df[mapply(function(x, y) !any(which(is.na(df[x, cols])) > y),1:nrow(df),
       max.col(!is.na(df[cols]), ties.method = "first")), ]

Using max.col we find the index of first non-NA value in the cols and then check if there is any value in that row which has NA after that index.

data

I added some rows to make a better example

df <- structure(list(data = structure(1:4, .Label = c("Ind_1", "Ind_2", 
"Ind_3", "Ind_4"), class = "factor"), W_Y1 = c(NA, NA, NA, NA
), W_Y2 = c(NA, NA, NA, 23L), W_Y3 = c(NA, NA, NA, NA), W_Y4 = c(82L, 
82L, 82L, 82L), W_Y5 = c(81L, 81L, 81L, 81L), W_Y6 = c(83L, 83L, 
83L, 83L), W_Y7 = c(84L, 84L, NA, 84L), W_8 = c(NA, 65L, NA, 
12L), W_9 = c(86L, 86L, 86L, 86L)), class = "data.frame", row.names = c(NA, 
-4L))

df
#   data W_Y1 W_Y2 W_Y3 W_Y4 W_Y5 W_Y6 W_Y7 W_8 W_9
#1 Ind_1   NA   NA   NA   82   81   83   84  NA  86
#2 Ind_2   NA   NA   NA   82   81   83   84  65  86
#3 Ind_3   NA   NA   NA   82   81   83   NA  NA  86
#4 Ind_4   NA   23   NA   82   81   83   84  12  86

Max.Col with Na Removal