Why does dplyr's filter drop NA values from a factor variable?
You could use this:
filter(dat, var1 != 1 | is.na(var1))
var1
1 <NA>
2 3
3 3
4 <NA>
5 2
6 2
7 <NA>
And it won't.
Also just for completion, dropping NAs is the intended behavior of filter
as you can see from the following:
test_that("filter discards NA", {
temp <- data.frame(
i = 1:5,
x = c(NA, 1L, 1L, 0L, 0L)
)
res <- filter(temp, x == 1)
expect_equal(nrow(res), 2L)
})
This test above was taken from the tests for filter
from github.
Removing NA observations with dplyr::filter()
From @Ben Bolker:
[T]his has nothing specifically to do with dplyr::filter()
From @Marat Talipov:
[A]ny comparison with NA, including NA==NA, will return NA
From a related answer by @farnsy:
The == operator does not treat NA's as you would expect it to.
Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.
Designing a function so filter does not drop NAs
Try coalesce
df %>% filter(coalesce(A != B, TRUE))
Why I loose my NA's after count and filter (dplyr)
From the Help file of filter()
...Only rows where the condition evaluates to TRUE are kept...
NA != -1
[1] NA
Since your condition returns a NA (hence not TRUE) you need a second OR condition:
df %>%
filter(Procedure != -1 | is.na(Procedure))
When filtering with dplyr in R, why do filtered out levels of a variable remain in filtered data?
Factors in R do not automatically drop levels when filtered. You may think this is a silly default (I do), but it's easy to deal with -- just use the droplevels
function on the result.
new_data <- data %>%
filter(y == "yes") %>%
droplevels
levels(new_data$y)
## [1] "yes"
If you did this all the time you could define a new function
dfilter <- function(...) droplevels(filter(...))
How to filter data.frame by a factor that includes NA as level
Check if the levels of the corresponding df$a
is na:
df[is.na(levels(df$a)[df$a]),]
a b
6 <NA> 0.1649003
8 <NA> 0.6556045
As Frank pointed out, this also includes observations where the value of df$a
, not just it's level, is NA
. I guess the original poster wanted to include these cases. If not, one can do something like
x <- factor(c("A","B", NA), levels=c("A", NA), exclude = NULL)
i <- which(is.na(levels(x)[x]))
i[!is.na(x[i])]
gives you 3
, only the NA
-level, leaving out unknown level (B).
Related Topics
Why Does Is.Vector() Return True for List
How to Get the Min/Max Possible Numeric
Knn in R: 'Train and Class Have Different Lengths'
Documentation for Special Variables in Ggplot (..Count.., ..Density.., etc.)
Rstudio Calls Source() When Saving Script
How to Make a Barplot with R from a Table
Ggplot2: Problem with X Axis When Adding Regression Line Equation on Each Facet
R Shiny Ggplot Bar and Line Charts with Dynamic Variable Selection and Y Axis to Be Percentages
Arranging Rows in Custom Order Using Dplyr
Nas Are Not Allowed in Subscripted Assignments
Using R to Do a Regression with Multiple Dependent and Multiple Independent Variables
How to 'Unlist' a Column in a Data.Table
R: Matrix by Vector Multiplication
Combining Geom_Point and Geom_Line with Position_Jitterdodge for Two Grouping Factors