Remove rows in R matrix where all data is NA
Solutions using rowSums() generally outperform apply() ones:
m <- structure(c( 1, NA, 3, 4, 5,
6, NA, 8, NA, 10,
11, NA, 13, NA, NA),
.Dim = c(5L, 3L))
m[rowSums(is.na(m)) != ncol(m), ]
[,1] [,2] [,3]
[1,] 1 6 11
[2,] 3 8 13
[3,] 4 NA NA
[4,] 5 10 NA
remove Rows with complete set of NA
We can use dplyr. With the example by @lovalery:
library(dplyr)
df %>% filter(!if_all(V2:V3, is.na))
#> V1 V2 V3
#> 1 3 3 NA
#> 2 NA 1 NA
#> 3 3 5 NA
We can use many different selection statements inside if_all
. Check the documentation for more examples.
Remove rows where all variables are NA using dplyr
Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:
# dplyr 0.7.0
dat %>% filter_all(any_vars(!is.na(.)))
Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.
UPDATE:
Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:
# dplyr 1.0.0
dat %>% filter(if_any(everything(), ~ !is.na(.)))
Removing rows of a matrix with at least one NA in R
Try using the na.omit
function:
x <- matrix(c(32, 54, 34, NA, 10, NA, 17, 93, NA), nrow = 3, ncol = 3, byrow = TRUE)
na.omit(x)
Output:
[,1] [,2] [,3]
[1,] 32 54 34
attr(,"na.action")
[1] 2 3
attr(,"class")
[1] "omit"
Remove rows with all or some NAs (missing values) in data.frame
Also check complete.cases
:
> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
is nicer for just removing all NA
's. complete.cases
allows partial selection by including only certain columns of the dataframe:
> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
Your solution can't work. If you insist on using is.na
, then you have to do something like:
> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
but using complete.cases
is quite a lot more clear, and faster.
How to remove only rows that have all NA in R?
Here is one option; first you need to define the NA pattern.
df[df == ""] <- NA # define NA pattern
df[rowSums(is.na(df)) != ncol(df), ] # result
# Try with x1 <- c("Bob", "Mary","","","")
Remove rows with NAs for matrices in a list
If it is only for the first column, we loop through the list
with lapply
, then use an. anonymous function call, get the first column (x[,1]
), check the NA
with (is.na
), negate (!
) i.e. for non_NA elemengts and subset the rows of the dataset based. on that
list2 <- lapply(list, function(x) x[!is.na(x[,1]),, drop = FALSE])
For the entire dataset in the list
, we can make use of rowSums
on the logical matrix created with is.na
on the entire dataset.
lapply(list, function(x) x[rowSums(is.na(x)) == 0,, drop = FALSE])
How to remove row if it has a NA value in one certain column
The easiest solution is to use is.na()
:
df[!is.na(df$B), ]
which gives you:
A B C
1 NA 2 NA
2 1 2 3
4 1 2 3
Related Topics
Why Does Unlist() Kill Dates in R
Adding Space Between Bars in Ggplot2
Subset a Column in Data Frame Based on Another Data Frame/List
Remove All Punctuation Except Apostrophes in R
Dplyr Mutate Rowwise Max of Range of Columns
Twitter, Roauth and Windows: Register Ok, But Certificate Verify Failed
Extract Matrix Column Values by Matrix Column Name
How to Find All Functions in an R Package
Installation of Rodbc/Roracle Packages on Os X Mavericks
Extracting Unique Numbers from String in R
How to Add Hatches, Stripes or Another Pattern or Texture to a Barplot in Ggplot
Simplest Way to Get Rbind to Ignore Column Names
Calculate Cumulative Average (Mean)
Create Categorical Variable in R Based on Range
Specification of First and Last Tick Marks with Scale_X_Date
How to Use a String Variable to Select a Data Frame Column Using $ Notation