Remove rows where all variables are NA using dplyr
Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:
# dplyr 0.7.0
dat %>% filter_all(any_vars(!is.na(.)))
Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.
UPDATE:
Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:
# dplyr 1.0.0
dat %>% filter(if_any(everything(), ~ !is.na(.)))
Remove rows where all columns except one have NA values?
We may use if_all
in filter
- select the columns a to b in if_all
, apply the is.na
(check for NA), the output will be TRUE for a row if both a and b have NA, negate (!
) to convert TRUE-> FALSE and FALSE->TRUE
library(dplyr)
df %>%
filter(!if_all(a:b, is.na))
-output
ID a b
1 1 ab <NA>
2 1 <NA> ab
Or instead of negating (!
), we may use complete.cases
with if_any
df %>%
filter(if_any(a:b, complete.cases))
ID a b
1 1 ab <NA>
2 1 <NA> ab
Regarding the issue in OP's code, the logic is created by looking whether there is atleast one NA (> 0
) which is true for all the rows. Instead, it should be all NA and then negate
na_rows <- df %>%
select(-"ID") %>%
is.na() %>%
{rowSums(.) == ncol(.)}
data
df <- structure(list(ID = c(1L, 1L, 1L), a = c("ab", NA, NA), b = c(NA,
"ab", NA)), class = "data.frame", row.names = c(NA, -3L))
Remove rows with all or some NAs (missing values) in data.frame
Also check complete.cases
:
> final[complete.cases(final), ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
na.omit
is nicer for just removing all NA
's. complete.cases
allows partial selection by including only certain columns of the dataframe:
> final[complete.cases(final[ , 5:6]),]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
Your solution can't work. If you insist on using is.na
, then you have to do something like:
> final[rowSums(is.na(final[ , 5:6])) == 0, ]
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
but using complete.cases
is quite a lot more clear, and faster.
Remove NA rows in R only if meeting a requirement from another column?
Use %in%
to compare multiple values and I think the condition needs some correction.
library(dplyr)
df %>% filter(!(Category %in% c("IV", "IVA", "IVB") & is.na(Site)))
# Category Site
#1 I Hip
#2 III <NA>
#3 IVB Leg
#4 II Arm
Or in base R -
subset(df, !(Category %in% c("IV", "IVA", "IVB") & is.na(Site)))
How can I filter rows that are all NA using dplyr `across()` syntax?
In the previous version of dplyr
you could use filter_at
in this way :
library(dplyr)
df %>% filter_at(vars(-one_of(c("a", "d"))), any_vars(!is.na(.)))
across
do not have direct replacement of any_vars
so one way would be to use Reduce
here :
df %>% filter(Reduce(`|`, across(-one_of(c("a", "d")),~!is.na(.))))
# A tibble: 2 x 4
# a b c d
# <chr> <dbl> <dbl> <chr>
#1 b NA 2 b
#2 c 3 3 c
How to remove rows where all columns are zero using dplyr pipe
Here's a dplyr option:
library(dplyr)
filter_all(dat, any_vars(. != 0))
# A-XXX fBM-XXX P-XXX vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000
Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.
Regarding row.names:
library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
# rowname A-XXX fBM-XXX P-XXX vBM-XXX
#1 BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
How to remove row if it has a NA value in one certain column
The easiest solution is to use is.na()
:
df[!is.na(df$B), ]
which gives you:
A B C
1 NA 2 NA
2 1 2 3
4 1 2 3
Filter data.frame with all colums NA but keep when some are NA
We can use base R
teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4
Or using apply
and any
teste[apply(!is.na(teste), 1, any),]
which can be also used within filter
teste %>%
filter(rowSums(!is.na(.)) >0)
Or using c_across
from dplyr
, we can directly remove the rows with all
NA
library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4
NOTE: filter_all
is getting deprecated
r retain rows only if non missing in specific column
We can create a condition in filter
to return all
the rows if there are only missing values in 'X1' or just remove the missing rows
library(dplyr)
df %>%
group_by(Id, Date) %>%
filter(if(all(is.na(X1))) TRUE else complete.cases(X1)) %>%
ungroup
-output
# A tibble: 3 × 6
Id Date Col1 Col2 Col3 X1
<int> <chr> <int> <int> <int> <int>
1 1 1/1/22 0 0 1 6
2 2 5/7/21 0 1 0 NA
3 2 5/7/21 0 2 0 NA
Or without the if/else
, use |
with &
condition
df %>%
group_by(Id, Date) %>%
filter(any(complete.cases(X1)) & complete.cases(X1) |
all(is.na(X1))) %>%
ungroup
data
df <- structure(list(Id = c(1L, 1L, 2L, 2L), Date = c("1/1/22", "1/1/22",
"5/7/21", "5/7/21"), Col1 = c(NA, 0L, 0L, 0L), Col2 = c(1L, 0L,
1L, 2L), Col3 = c(0L, 1L, 0L, 0L), X1 = c(NA, 6L, NA, NA)),
class = "data.frame", row.names = c(NA,
-4L))
Related Topics
Can't Change Fonts in Ggplot/Geom_Text
How to Have Conditional Markdown Chunk Execution in Rmarkdown
How Can Put Multiple Plots Side-By-Side in Shiny R
Predicted Values for Logistic Regression from Glm and Stat_Smooth in Ggplot2 Are Different
Collapsing/Hiding Figures in R Markdown
Creating Legend with Circles Leaflet R
How to Set Na.Rm to True Globally
How to Conditionally Replace Values in R Data Frame Using If/Then Statement
Regular Analysis Over Irregular Time Series
Display Y-Axis for Each Subplot When Faceting
Easier Way to Plot the Cumulative Frequency Distribution in Ggplot
How to Transpose a Dataframe in Tidyverse
Shiny - Can Dynamically Generated Buttons Act as Trigger for an Event
Faster Way to Subset on Rows of a Data Frame in R