Remove rows which have all NAs in certain columns
This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums()
with is.na()
it is easy to check whether all entries in these 5 columns are NA
:
x <- x[rowSums(is.na(x[,5:9]))!=5,]
Remove rows where all columns except one have NA values?
We may use if_all
in filter
- select the columns a to b in if_all
, apply the is.na
(check for NA), the output will be TRUE for a row if both a and b have NA, negate (!
) to convert TRUE-> FALSE and FALSE->TRUE
library(dplyr)
df %>%
filter(!if_all(a:b, is.na))
-output
ID a b
1 1 ab <NA>
2 1 <NA> ab
Or instead of negating (!
), we may use complete.cases
with if_any
df %>%
filter(if_any(a:b, complete.cases))
ID a b
1 1 ab <NA>
2 1 <NA> ab
Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0
) which is true for all the rows. Instead, it should be all NA and then negate
na_rows <- df %>%
select(-"ID") %>%
is.na() %>%
{rowSums(.) == ncol(.)}
data
df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA,
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))
How to remove row if it has a NA value in one certain column
The easiest solution is to use is.na()
:
df[!is.na(df$B), ]
which gives you:
A B C
1 NA 2 NA
2 1 2 3
4 1 2 3
Omit rows containing specific column of NA
You could use the complete.cases
function and put it into a function thusly:
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}
completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33
completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33
EDIT: Only return rows with no NA
s
If you want to eliminate all rows with at least one NA
in any column, just use the complete.cases
function straight up:
DF[complete.cases(DF), ]
# x y z
# 2 2 10 33
Or if completeFun
is already ingrained in your workflow ;)
completeFun(DF, names(DF))
Remove rows with all or some NAs (missing values) in data.frame
Also check complete.cases
:
> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
is nicer for just removing all NA
's. complete.cases
allows partial selection by including only certain columns of the dataframe:
> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
Your solution can't work. If you insist on using is.na
, then you have to do something like:
> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
but using complete.cases
is quite a lot more clear, and faster.
remove Rows with complete set of NA
We can use dplyr. With the example by @lovalery:
library(dplyr)
df %>% filter(!if_all(V2:V3, is.na))
#> V1 V2 V3
#> 1 3 3 NA
#> 2 NA 1 NA
#> 3 3 5 NA
We can use many different selection statements inside if_all
. Check the documentation for more examples.
Is there R syntax to delete rows with specific, multiple NAs in columns?
Test for NA and delete rows with a number of NA's equal to the number of columns tested using rowSums
.
dat[!rowSums(is.na(dat[c('Col2', 'Col3', 'Col4')])) == 3, ]
# ID Col1 Col2 Col3 Col4
# 1 Per1 1 2 3 4
# 3 Per3 NA NA 5 NA
subsetting rows contain NAs for certain columns
The issue is also that ==
returns NA
where there are NA
elements. Also, NA
is not "NA"
quoted
v1 <- c(NA, 3, 5, NA)
v1 == "NA"
#[1] NA FALSE FALSE NA
Or without quotes
v1 == NA
#[1] NA NA NA NA
The correct way is is.na
or complete.cases
complete.cases(v1) # returns TRUE where there are no NA
#[1] FALSE TRUE TRUE FALSE
is.na(v1) # returns TRUE where there are NAs
#[1] TRUE FALSE FALSE TRUE
If we check ?
Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.
delete rows that contain NAs in certain columns R
You can still use complete.cases()
. Just apply it to the desired columns (columns 1:4 in the example below) and then use the Boolean vector it returns to select valid rows from the entire data.frame.
set.seed(4)
x <- as.data.frame(replicate(6, sample(c(1:10,NA))))
x[complete.cases(x[1:4]),]
# V1 V2 V3 V4 V5 V6
# 1 7 4 6 8 10 5
# 2 1 2 5 5 1 2
# 5 6 8 4 10 6 6
# 6 2 6 9 3 4 4
# 7 4 3 3 1 2 1
# 9 8 5 2 7 7 3
# 10 10 10 1 2 5 NA
Related Topics
Understanding Ddply Error Message - Argument "By" Is Missing, with No Default
How to Overlay an Image on to a Ggplot
How to Get Discrete Factor Levels to Be Treated as Continuous
How to Make Single Stacked Bar Chart in Ggplot2
Iteratively Constructed Dataframe in R
Ggplot Geom_Bar: Stack and Center
Combine Multiple .Rdata Files Containing Objects with the Same Name into One Single .Rdata File
How to Perform a Pairwise T.Test in R Across Multiple Independent Vectors
Calculate Summary Statistics (E.G. Mean) on All Numeric Columns Using Data.Table
Efficient Multiplication of Columns in a Data Frame
Change Color Median Line Ggplot Geom_Boxplot()
Inserting a New Row to Data Frame for Each Group Id
Drawing a Tangent to the Plot and Finding the X-Intercept Using R
Changing Styles When Selecting and Deselecting Multiple Polygons with Leaflet/Shiny
Remove Weekend Data in a Dataframe
Ordered Factors in Ggplot2 Bar Chart
Ggplot2: Horizontal Position of Stat_Summary with Geom_Boxplot