How to Delete Columns That Contain Only Nas

How to delete columns that contain ONLY NAs?

One way of doing it:

df[, colSums(is.na(df)) != nrow(df)]

If the count of NAs in a column is equal to the number of rows, it must be entirely NA.

Or similarly

df[colSums(!is.na(df)) > 0]

Remove columns from dataframe where ALL values are NA

Try this:

df <- df[,colSums(is.na(df))<nrow(df)]

Remove columns from dataframe where some of values are NA

The data:

Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA)) 

This will remove all columns containing at least one NA:

Itun[ , colSums(is.na(Itun)) == 0]

An alternative way is to use apply:

Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]

Remove columns with NA's and/or Zeros Only

One option would be to create a logical vector with colSums based on the number of NA or 0 elements in each column

d[!colSums(is.na(d)|d ==0) == nrow(d)]
# a c
#1 1 98
#2 5 67
#3 56 NA
#4 4 3
#5 9 7

Or another option is to replace all the 0s to NA and then apply is.na

d[colSums(!is.na(replace(d, d == 0, NA))) > 0]

Or more compactly with na_if

d[colSums(!is.na(na_if(d, 0))) > 0]

removing columns with NA values only

The tidyverse approach would look like this (also using @Rich Scriven data):

d %>% select_if(~any(!is.na(.)))
# x
# 1 NA
# 2 3
# 3 NA

Remove rows with all or some NAs (missing values) in data.frame

Also check complete.cases :

> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2

na.omit is nicer for just removing all NA's. complete.cases allows partial selection by including only certain columns of the dataframe:

> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2

Your solution can't work. If you insist on using is.na, then you have to do something like:

> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2

but using complete.cases is quite a lot more clear, and faster.

Removing all columns with all NAs in a data.frame without a loop in R

Similarly to rowSums for rows, we can use colSums for columns

r[, colSums(is.na(r)) != nrow(r)]

# AA CC
#1 1 3
#2 NA NA
#3 3 5

R remove NA values from 3 columns only when all 3 have NA

The complete.cases code can be with | condition as complete.cases returns TRUE for a non-NA value and FALSE for NA. Thus, by using the OR, we are subsetting a row having at least one non-NA

data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]

Or more easily with rowSums

data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]

Or with dplyr with if_all or if_any

library(dplyr)
data %>%
filter(!if_all(c(A, B, C), is.na))


Related Topics



Leave a reply



Submit