Filtering Data Frame Based on Na on Multiple Columns

filtering data frame based on NA on multiple columns

We can get the logical index for both columns, use & and subset the rows.

df1[!is.na(df1$type) & !is.na(df1$company),]
# id type company
#3 3 North Alex
#5 NA North BDA

Or use rowSums on the logical matrix (is.na(df1[-1])) to subset.

df1[!rowSums(is.na(df1[-1])),]

How to use dplyr across to filter NA in multiple columns

We can use across to loop over the columns 'type', 'company' and return the rows that doesn't have any NA in the specified columns

library(dplyr)
df %>%
filter(across(c(type, company), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA

With filter, there are two options that are similar to all_vars/any_vars used with filter_at/filter_all

df %>%
filter(if_any(c(company, type), ~ !is.na(.)))
# id type company
#1 2 <NA> ADM
#2 3 North Alex
#3 4 South <NA>
#4 NA North BDA
#5 6 <NA> CA

Or using if_all

df %>%
filter(if_all(c(company, type), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA

data

df <- structure(list(id = c(1L, 2L, 3L, 4L, NA, 6L), type = c(NA, NA, 
"North", "South", "North", NA), company = c(NA, "ADM", "Alex",
NA, "BDA", "CA")), class = "data.frame", row.names = c(NA, -6L
))

How to filter rows with NA based on multiple conditions

tidyverse


library(tidyverse)

a_subset %>%
filter(
rowSums(!is.na(across(starts_with("group1_")))) >= 2 |
rowSums(!is.na(across(starts_with("group2_")))) >= 2)

#> group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1 NA 0.4 0.5 -0.5 NA -0.5
#> b3 0.5 0.3 NA -0.2 -0.4 -0.4
#> b4 1.0 NA 2.0 NA NA NA

data

a_subset <- data.frame(
row.names = c("b1", "b2", "b3", "b4"),
group1_1 = c(NA, 1.5, 0.5, 1),
group1_2 = c(0.4, NA, 0.3, NA),
group1_3 = c(0.5, NA, NA, 2),
group2_1 = c(-0.5, -2.5, -0.2, NA),
group2_2 = c(NA, NA, -0.4, NA),
group2_3 = c(-0.5, NA, -0.4, NA)
)

NA values introduced when I filter on multiple columns

Your conditional check nest.stat fails when comparing "F" with NA's.

Here's a messy, base-R way of doing this:

df[!(df$locname == "CARACO CREEK" & 
ifelse(!is.na(df$nest.stat),df$nest.stat == "F",FALSE) &
df$yr == 1994),]

Output:

   locname mo dy   yr nest.stat daynight
1 CARACO CREEK 3 9 1994 U D
2 CARACO CREEK 4 4 1994 <NA> D
3 CARACO CREEK 4 14 1994 <NA> N
4 CARACO CREEK 5 5 1994 <NA> D
5 CARACO CREEK 5 17 1994 <NA> N
6 CARACO CREEK 6 29 1994 <NA> N

Filtering a data frame based on multiple columns sharing a name

You don't need to loop or apply anything. Continuing from your grep method,

i1 <- grep("type", names(a))
which(rowSums(is.na(a[i1])) == length(i1))
#[1] 2

NOTE I renamed your data frame to a since data is already defined as a function in R

Filter data frame based off two columns in other data frame

Using %in%

dfZero <- df[df$Username %in% key[key$training == 0, "username"],]
dfOne <- df[df$Username %in% key[key$training == 1, "username"],]

Using merge()

dfZero <- merge(df, key[key$training == 0,], by.x = "Username", by.y = "username")
dfOne <- merge(df, key[key$training == 1,], by.x = "Username", by.y = "username")

Removing NA's using filter function on few columns of the data frame

If there are more than one column, use filter_at

library(dplyr)     
df %>%
filter_at(vars(KeyPress, KPIndex, X, Y), any_vars(!is.na(.)))

Or with rowSums from base R

nm1 <- c("KeyPress", "KPIndex", "X", "Y")
df[rowSums(!is.na(df[nm1]))!= 0,]

data

df <- structure(list(S.No = 1:3, MediaName = c("Dat", "New", "Dat"), 
KeyPress = c(NA, NA, NA), KPIndex = c(1L, NA, 2L), Type = c("Fixation",
"Saccade", "Fixation"), Secs = c(18L, 33L, 23L), X = c(117L,
NA, 117L), Y = c(89L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))

Filter data.frame with all colums NA but keep when some are NA

We can use base R

teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4

Or using apply and any

teste[apply(!is.na(teste), 1, any),]

which can be also used within filter

teste %>%
filter(rowSums(!is.na(.)) >0)

Or using c_across from dplyr, we can directly remove the rows with all NA

library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4

NOTE: filter_all is getting deprecated



Related Topics



Leave a reply



Submit