Removing Na in Dplyr Pipe

Removing NA in dplyr pipe

I don't think desc takes an na.rm argument... I'm actually surprised it doesn't throw an error when you give it one. If you just want to remove NAs, use na.omit (base) or tidyr::drop_na:

outcome.df %>%
na.omit() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

library(tidyr)
outcome.df %>%
drop_na() %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

If you only want to remove NAs from the HeartAttackDeath column, filter with is.na, or use tidyr::drop_na:

outcome.df %>%
filter(!is.na(HeartAttackDeath)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

outcome.df %>%
drop_na(HeartAttackDeath) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

As pointed out at the dupe, complete.cases can also be used, but it's a bit trickier to put in a chain because it takes a data frame as an argument but returns an index vector. So you could use it like this:

outcome.df %>%
filter(complete.cases(.)) %>%
group_by(Hospital, State) %>%
arrange(desc(HeartAttackDeath)) %>%
head()

Removing NA observations with dplyr::filter()

From @Ben Bolker:

[T]his has nothing specifically to do with dplyr::filter()

From @Marat Talipov:

[A]ny comparison with NA, including NA==NA, will return NA

From a related answer by @farnsy:

The == operator does not treat NA's as you would expect it to.

Think of NA as meaning "I don't know what's there". The correct answer
to 3 > NA is obviously NA because we don't know if the missing value
is larger than 3 or not. Well, it's the same for NA == NA. They are
both missing values but the true values could be quite different, so
the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of
potentially introducing bugs that would later end up being published
an embarrassing you, it doesn't allow comparison operators to think NA
is a value.

Remove NA values with tidyverse mutate

You need to use the column name in which you want to detect "n/a" values.

library(dplyr)
library(tidyr)

data %>%
mutate(value = replace(job_industry_category,
job_industry_category == "n/a", NA)) %>%
drop_na()

You can also do this without converting those values to actual NA.

data %>% filter(job_industry_category != "n/a")

#Base R :

subset(data, job_industry_category != "n/a")

Remove NA row from a single dataframe within list

If you specifically want to act on the list member named "b" you could use map_if:

l %>% 
map_if(names(.) == "b", na.omit)

lapply(l, na.omit) will remove NA rows from any element of the list.

lapply(l, na.omit)

$a
[1] "X" "Y" "Z"

$b
a b
1 A R
2 B G
3 C B

$c
header value
1 1 0
2 2 10
3 3 15

If you really want to use map and pipes for any element:

l %>% 
map(., na.omit)

using dplyr pipe to remove empty columns in a list of dataframes

You could use two select functions :

library(dplyr)
library(purrr)

LIST %>% map(~ .x %>% select(contains("1")) %>% select_if(!all(is.na(.))))

#[[1]]
# col_a1
#1 a
#2 b

#[[2]]
#data frame with 0 columns and 2 rows

Using only one select function we can do :

LIST %>% map(~ .x %>% select_if(str_detect(names(.x), '1') & 
colSums(!is.na(.x)) > 0))

And similarly in base R :

lapply(LIST, function(x) x[colSums(!is.na(x)) > 0 & grepl('1', names(x))])

Removing NA's using filter function on few columns of the data frame

If there are more than one column, use filter_at

library(dplyr)     
df %>%
filter_at(vars(KeyPress, KPIndex, X, Y), any_vars(!is.na(.)))

Or with rowSums from base R

nm1 <- c("KeyPress", "KPIndex", "X", "Y")
df[rowSums(!is.na(df[nm1]))!= 0,]

data

df <- structure(list(S.No = 1:3, MediaName = c("Dat", "New", "Dat"), 
KeyPress = c(NA, NA, NA), KPIndex = c(1L, NA, 2L), Type = c("Fixation",
"Saccade", "Fixation"), Secs = c(18L, 33L, 23L), X = c(117L,
NA, 117L), Y = c(89L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))

Piping the removal of empty columns using dplyr

We can use a version of select_if

library(dplyr)
df %>%
select_if(function(x) !(all(is.na(x)) | all(x=="")))

# id Q2 Q3 Q4
#1 1 1 NA
#2 2 2
#3 3 4 3 2
#4 4 5 4 2

Or without using an anonymous function call

df %>% select_if(~!(all(is.na(.)) | all(. == "")))

You can also modify your apply statement as

df[!apply(df, 2, function(x) all(is.na(x)) | all(x==""))]

Or using colSums

df[colSums(is.na(df) | df == "") != nrow(df)]

and inverse

df[colSums(!(is.na(df) | df == "")) > 0]

Fuse multiple data.frame date fields by removing NA using piping

We can use coalesce to create the new column,

library(dplyr)
dd %>%
transmute(newcol = coalesce(f1, f2, f3)) #%>%
#then `filter` the rows to remove the NA elements
#and `pull` as a `vector` (if needed)
#filter(!is.na(newcol)) %>%
#pull(newcol)
# newcol
#1 2010-01-24
#2 2012-03-24
#3 2014-11-22
#4 <NA>

Remove rows where all variables are NA using dplyr

Since dplyr 0.7.0 new, scoped filtering verbs exists. Using filter_any you can easily filter rows with at least one non-missing column:

# dplyr 0.7.0
dat %>% filter_all(any_vars(!is.na(.)))

Using @hejseb benchmarking algorithm it appears that this solution is as efficient as f4.

UPDATE:

Since dplyr 1.0.0 the above scoped verbs are superseded. Instead the across function family was introduced, which allows to perform a function on multiple (or all) columns. Filtering rows with at least one column being not NA looks now like this:

# dplyr 1.0.0
dat %>% filter(if_any(everything(), ~ !is.na(.)))

How to remove rows with NAs in all columns using dplyr?

This can be done using filter_all:

df %>% filter_all(any_vars(!is.na(.)))


Related Topics



Leave a reply



Submit