Remove Rows Which Have All Nas in Certain Columns

Remove rows which have all NAs in certain columns

This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums() with is.na() it is easy to check whether all entries in these 5 columns are NA:

x <- x[rowSums(is.na(x[,5:9]))!=5,]

Remove rows where all columns except one have NA values?

We may use if_all in filter- select the columns a to b in if_all, apply the is.na (check for NA), the output will be TRUE for a row if both a and b have NA, negate (!) to convert TRUE-> FALSE and FALSE->TRUE

library(dplyr)
df %>%
filter(!if_all(a:b, is.na))

-output

ID    a    b
1 1 ab <NA>
2 1 <NA> ab

Or instead of negating (!), we may use complete.cases with if_any

df %>% 
filter(if_any(a:b, complete.cases))
ID a b
1 1 ab <NA>
2 1 <NA> ab

Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0) which is true for all the rows. Instead, it should be all NA and then negate

na_rows <- df %>% 
select(-"ID") %>%
is.na() %>%
{rowSums(.) == ncol(.)}

data

df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA, 
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))

How to remove row if it has a NA value in one certain column

The easiest solution is to use is.na():

df[!is.na(df$B), ]

which gives you:

   A B  C
1 NA 2 NA
2 1 2 3
4 1 2 3

Omit rows containing specific column of NA

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}

completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
# x y z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

Remove rows with all or some NAs (missing values) in data.frame

Also check complete.cases :

> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2

na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2

Your solution can't work. If you insist on using is.na, then you have to do something like:

> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2

but using complete.cases is quite a lot more clear, and faster.

remove Rows with complete set of NA

We can use dplyr. With the example by @lovalery:

library(dplyr)

df %>% filter(!if_all(V2:V3, is.na))

#> V1 V2 V3
#> 1 3 3 NA
#> 2 NA 1 NA
#> 3 3 5 NA

We can use many different selection statements inside if_all. Check the documentation for more examples.

Is there R syntax to delete rows with specific, multiple NAs in columns?

Test for NA and delete rows with a number of NA's equal to the number of columns tested using rowSums.

dat[!rowSums(is.na(dat[c('Col2', 'Col3', 'Col4')])) == 3, ]
# ID Col1 Col2 Col3 Col4
# 1 Per1 1 2 3 4
# 3 Per3 NA NA 5 NA

subsetting rows contain NAs for certain columns

The issue is also that == returns NA where there are NA elements. Also, NA is not "NA" quoted

v1 <- c(NA, 3, 5, NA)
v1 == "NA"
#[1] NA FALSE FALSE NA

Or without quotes

v1 == NA
#[1] NA NA NA NA

The correct way is is.na or complete.cases

complete.cases(v1) # returns TRUE where there are no NA
#[1] FALSE TRUE TRUE FALSE

is.na(v1) # returns TRUE where there are NAs
#[1] TRUE FALSE FALSE TRUE

If we check ?

Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.

delete rows that contain NAs in certain columns R

You can still use complete.cases(). Just apply it to the desired columns (columns 1:4 in the example below) and then use the Boolean vector it returns to select valid rows from the entire data.frame.

set.seed(4)
x <- as.data.frame(replicate(6, sample(c(1:10,NA))))
x[complete.cases(x[1:4]),]
# V1 V2 V3 V4 V5 V6
# 1 7 4 6 8 10 5
# 2 1 2 5 5 1 2
# 5 6 8 4 10 6 6
# 6 2 6 9 3 4 4
# 7 4 3 3 1 2 1
# 9 8 5 2 7 7 3
# 10 10 10 1 2 5 NA


Related Topics



Leave a reply



Submit