How to delete columns that contain ONLY NAs?
One way of doing it:
df[, colSums(is.na(df)) != nrow(df)]
If the count of NAs in a column is equal to the number of rows, it must be entirely NA.
Or similarly
df[colSums(!is.na(df)) > 0]
Remove columns from dataframe where ALL values are NA
Try this:
df <- df[,colSums(is.na(df))<nrow(df)]
Remove columns from dataframe where some of values are NA
The data:
Itun <- data.frame(v1 = c(1,1,2,1,2,1), v2 = c(NA, 1, 2, 1, 2, NA))
This will remove all columns containing at least one NA
:
Itun[ , colSums(is.na(Itun)) == 0]
An alternative way is to use apply
:
Itun[ , apply(Itun, 2, function(x) !any(is.na(x)))]
Remove columns with NA's and/or Zeros Only
One option would be to create a logical vector with colSums
based on the number of NA
or 0 elements in each column
d[!colSums(is.na(d)|d ==0) == nrow(d)]
# a c
#1 1 98
#2 5 67
#3 56 NA
#4 4 3
#5 9 7
Or another option is to replace
all the 0s to NA
and then apply is.na
d[colSums(!is.na(replace(d, d == 0, NA))) > 0]
Or more compactly with na_if
d[colSums(!is.na(na_if(d, 0))) > 0]
removing columns with NA values only
The tidyverse approach would look like this (also using @Rich Scriven data):
d %>% select_if(~any(!is.na(.)))
# x
# 1 NA
# 2 3
# 3 NA
Remove rows with all or some NAs (missing values) in data.frame
Also check complete.cases
:
> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
is nicer for just removing all NA
's. complete.cases
allows partial selection by including only certain columns of the dataframe:
> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
Your solution can't work. If you insist on using is.na
, then you have to do something like:
> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
but using complete.cases
is quite a lot more clear, and faster.
Removing all columns with all NAs in a data.frame without a loop in R
Similarly to rowSums
for rows, we can use colSums
for columns
r[, colSums(is.na(r)) != nrow(r)]
# AA CC
#1 1 3
#2 NA NA
#3 3 5
R remove NA values from 3 columns only when all 3 have NA
The complete.cases
code can be with |
condition as complete.cases
returns TRUE for a non-NA value and FALSE for NA
. Thus, by using the OR
, we are subsetting a row having at least one non-NA
data[complete.cases(data$A) | complete.cases(data$B) | complete.cases(data$C),]
Or more easily with rowSums
data[rowSums(is.na(data[, c("A", "B", "C")])) < 3,]
Or with dplyr
with if_all
or if_any
library(dplyr)
data %>%
filter(!if_all(c(A, B, C), is.na))
Related Topics
Avoid String Printed to Console Getting Truncated (In Rstudio)
How to Select a Cran Mirror in R
Removing Display of Row Names from Data Frame
R Shiny Rest API Communication
How to Define Fixed Aspect-Ratio for (Base R) Scatter-Plot
Ggplot2 Shade Area Under Density Curve by Group
Convert a Dataframe to Presence Absence Matrix
Set Default Cran Mirror Permanent in R
Control Point Border Thickness in Ggplot
Libstdc++.So.6: Version 'Glibcxx_3.4.26' Not Found on Linux
Calculate Cumsum() While Ignoring Na Values
How to Make Graphics with Transparent Background in R Using Ggplot2
Duplicate 'Row.Names' Are Not Allowed Error
Stacked Bar Chart in R (Ggplot2) with Y Axis and Bars as Percentage of Counts
Print Unicode Character String in R
Change Background and Text of Strips Associated to Multiple Panels in R/Lattice