How to Remove Na Data in Only One Columns

Omit rows containing specific column of NA

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}

completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
# x y z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

How to remove row if it has a NA value in one certain column

The easiest solution is to use is.na():

df[!is.na(df$B), ]

which gives you:

   A B  C
1 NA 2 NA
2 1 2 3
4 1 2 3

Remove rows where all columns except one have NA values?

We may use if_all in filter- select the columns a to b in if_all, apply the is.na (check for NA), the output will be TRUE for a row if both a and b have NA, negate (!) to convert TRUE-> FALSE and FALSE->TRUE

library(dplyr)
df %>%
filter(!if_all(a:b, is.na))

-output

ID    a    b
1 1 ab <NA>
2 1 <NA> ab

Or instead of negating (!), we may use complete.cases with if_any

df %>% 
filter(if_any(a:b, complete.cases))
ID a b
1 1 ab <NA>
2 1 <NA> ab

Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0) which is true for all the rows. Instead, it should be all NA and then negate

na_rows <- df %>% 
select(-"ID") %>%
is.na() %>%
{rowSums(.) == ncol(.)}

data

df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA, 
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))

How to remove NA data in only one columns?

Use is.na() on the relevant vector of data you wish to look for and index using the negated result. For exmaple:

R> data[!is.na(data$A), ]
date A B
1 2014-01-01 2 3
2 2014-01-02 5 NA
4 2014-01-04 7 11
R> data[!is.na(data$B), ]
date A B
1 2014-01-01 2 3
4 2014-01-04 7 11

is.na() returns TRUE for every element that is NA and FALSE otherwise. To index the rows of the data frame, we can use this logical vector, but we want its converse. Hence we use ! to imply the opposite (TRUE becomes FALSE and vice versa).

You can restrict which columns you return by adding an index for the columns after the , in [ , ], e.g.

R> data[!is.na(data$A), 1:2]
date A
1 2014-01-01 2
2 2014-01-02 5
4 2014-01-04 7

R remove NA values from 3 columns only when all 3 have NA

The complete.cases code can be with | condition as complete.cases returns TRUE for a non-NA value and FALSE for NA. Thus, by using the OR, we are subsetting a row having at least one non-NA

data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]

Or more easily with rowSums

data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]

Or with dplyr with if_all or if_any

library(dplyr)
data %>%
filter(!if_all(c(A, B, C), is.na))


Related Topics



Leave a reply



Submit