Removing NA in dplyr pipe
I don't think desc
takes an na.rm
argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NA
s, use na.omit
(base) or tidyr::drop_na
:
outcome.df %>%
na.omit() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
library(tidyr)
outcome.df %>%
drop_na() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
If you only want to remove NA
s from the HeartAttackDeath column, filter with is.na
, or use tidyr::drop_na
:
outcome.df %>%
filter(!is.na(HeartAttackDeath)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
outcome.df %>%
drop_na(HeartAttackDeath) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
As pointed out at the dupe, complete.cases
can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:
outcome.df %>%
filter(complete.cases(.)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()
Removing NA observations with dplyr::filter()
From @Ben Bolker:
[T]his has nothing specifically to do with dplyr::filter()
From @Marat Talipov:
[A]ny comparison with NA, including NA==NA, will return NA
From a related answer by @farnsy:
The == operator does not treat NA's as you would expect it to.
Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.
Remove NA values with tidyverse mutate
You need to use the column name in which you want to detect "n/a" values.
library(dplyr)
library(tidyr)
data %>%
mutate(value = replace(job_industry_category,
job_industry_category == "n/a", NA)) %>%
drop_na()
You can also do this without converting those values to actual NA
.
data %>% filter(job_industry_category != "n/a")
#Base R :
subset(data, job_industry_category != "n/a")
Remove NA row from a single dataframe within list
If you specifically want to act on the list member named "b" you could use map_if
:
l %>%
map_if(names(.) == "b", na.omit)
lapply(l, na.omit)
will remove NA rows from any element of the list.
lapply(l, na.omit)
$a
[1] "X" "Y" "Z"
$b
a b
1 A R
2 B G
3 C B
$c
header value
1 1 0
2 2 10
3 3 15
If you really want to use map
and pipes for any element:
l %>%
map(., na.omit)
using dplyr pipe to remove empty columns in a list of dataframes
You could use two select
functions :
library(dplyr)
library(purrr)
LIST %>% map(~ .x %>% select(contains("1")) %>% select_if(!all(is.na(.))))
#[[1]]
# col_a1
#1 a
#2 b
#[[2]]
#data frame with 0 columns and 2 rows
Using only one select
function we can do :
LIST %>% map(~ .x %>% select_if(str_detect(names(.x), '1') &
colSums(!is.na(.x)) > 0))
And similarly in base R :
lapply(LIST, function(x) x[colSums(!is.na(x)) > 0 & grepl('1', names(x))])
Removing NA's using filter function on few columns of the data frame
If there are more than one column, use filter_at
library(dplyr)
df %>%
filter_at(vars(KeyPress, KPIndex, X, Y), any_vars(!is.na(.)))
Or with rowSums
from base R
nm1 <- c("KeyPress", "KPIndex", "X", "Y")
df[rowSums(!is.na(df[nm1]))!= 0,]
data
df <- structure(list(S.No = 1:3, MediaName = c("Dat", "New", "Dat"),
KeyPress = c(NA, NA, NA), KPIndex = c(1L, NA, 2L), Type = c("Fixation",
"Saccade", "Fixation"), Secs = c(18L, 33L, 23L), X = c(117L,
NA, 117L), Y = c(89L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
Piping the removal of empty columns using dplyr
We can use a version of select_if
library(dplyr)
df %>%
select_if(function(x) !(all(is.na(x)) | all(x=="")))
# id Q2 Q3 Q4
#1 1 1 NA
#2 2 2
#3 3 4 3 2
#4 4 5 4 2
Or without using an anonymous function call
df %>% select_if(~!(all(is.na(.)) | all(. == "")))
You can also modify your apply
statement as
df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]
Or using colSums
df[colSums(is.na(df) | df == "") != nrow(df)]
and inverse
df[colSums(!(is.na(df) | df == "")) > 0]
Fuse multiple data.frame date fields by removing NA using piping
We can use coalesce
to create the new column,
library(dplyr)
dd %>%
transmute(newcol = coalesce(f1, f2, f3)) #%>%
#then `filter` the rows to remove the NA elements
#and `pull` as a `vector` (if needed)
#filter(!is.na(newcol)) %>%
#pull(newcol)
# newcol
#1 2010-01-24
#2 2012-03-24
#3 2014-11-22
#4 <NA>
Remove rows where all variables are NA using dplyr
Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:
# dplyr 0.7.0
dat %>% filter_all(any_vars(!is.na(.)))
Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.
UPDATE:
Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:
# dplyr 1.0.0
dat %>% filter(if_any(everything(), ~ !is.na(.)))
How to remove rows with NAs in all columns using dplyr?
This can be done using filter_all:
df %>% filter_all(any_vars(!is.na(.)))
Related Topics
Put Multiple Data Frames into List (Smart Way)
R: Legend with Points and Lines Being Different Colors (For the Same Legend Item)
Writing Functions VS. Line-By-Line Interpretation in an R Workflow
Running Cor() (Or Any Variant) Over a Sparse Matrix in R
How to Modify This Correlation Matrix Plot
How to Replicate Knit HTML in a Command Line
Aggregation Using Ffdfdply Function in R
Warning: Non-Integer #Successes in a Binomial Glm! (Survey Packages)
How Does the Removesparseterms in R Work
How to Hide Code in Rmarkdown, with Option to See It
How to Add Rmse, Slope, Intercept, R^2 to R Plot
View the Source of an R Package