filtering data frame based on NA on multiple columns
We can get the logical index for both columns, use &
and subset the rows.
df1[!is.na(df1$type) & !is.na(df1$company),]
# id type company
#3 3 North Alex
#5 NA North BDA
Or use rowSums
on the logical matrix (is.na(df1[-1])
) to subset.
df1[!rowSums(is.na(df1[-1])),]
How to use dplyr across to filter NA in multiple columns
We can use across
to loop over the columns 'type', 'company' and return the rows that doesn't have any NA in the specified columns
library(dplyr)
df %>%
filter(across(c(type, company), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA
With filter
, there are two options that are similar to all_vars/any_vars
used with filter_at/filter_all
df %>%
filter(if_any(c(company, type), ~ !is.na(.)))
# id type company
#1 2 <NA> ADM
#2 3 North Alex
#3 4 South <NA>
#4 NA North BDA
#5 6 <NA> CA
Or using if_all
df %>%
filter(if_all(c(company, type), ~ !is.na(.)))
# id type company
#1 3 North Alex
#2 NA North BDA
data
df <- structure(list(id = c(1L, 2L, 3L, 4L, NA, 6L), type = c(NA, NA,
"North", "South", "North", NA), company = c(NA, "ADM", "Alex",
NA, "BDA", "CA")), class = "data.frame", row.names = c(NA, -6L
))
How to filter rows with NA based on multiple conditions
tidyverse
library(tidyverse)
a_subset %>%
filter(
rowSums(!is.na(across(starts_with("group1_")))) >= 2 |
rowSums(!is.na(across(starts_with("group2_")))) >= 2)
#> group1_1 group1_2 group1_3 group2_1 group2_2 group2_3
#> b1 NA 0.4 0.5 -0.5 NA -0.5
#> b3 0.5 0.3 NA -0.2 -0.4 -0.4
#> b4 1.0 NA 2.0 NA NA NA
data
a_subset <- data.frame(
row.names = c("b1", "b2", "b3", "b4"),
group1_1 = c(NA, 1.5, 0.5, 1),
group1_2 = c(0.4, NA, 0.3, NA),
group1_3 = c(0.5, NA, NA, 2),
group2_1 = c(-0.5, -2.5, -0.2, NA),
group2_2 = c(NA, NA, -0.4, NA),
group2_3 = c(-0.5, NA, -0.4, NA)
)
NA values introduced when I filter on multiple columns
Your conditional check nest.stat fails when comparing "F" with NA's.
Here's a messy, base-R way of doing this:
df[!(df$locname == "CARACO CREEK" &
ifelse(!is.na(df$nest.stat),df$nest.stat == "F",FALSE) &
df$yr == 1994),]
Output:
locname mo dy yr nest.stat daynight
1 CARACO CREEK 3 9 1994 U D
2 CARACO CREEK 4 4 1994 <NA> D
3 CARACO CREEK 4 14 1994 <NA> N
4 CARACO CREEK 5 5 1994 <NA> D
5 CARACO CREEK 5 17 1994 <NA> N
6 CARACO CREEK 6 29 1994 <NA> N
Filtering a data frame based on multiple columns sharing a name
You don't need to loop or apply anything. Continuing from your grep
method,
i1 <- grep("type", names(a))
which(rowSums(is.na(a[i1])) == length(i1))
#[1] 2
NOTE I renamed your data frame to a
since data
is already defined as a function in R
Filter data frame based off two columns in other data frame
Using %in%
dfZero <- df[df$Username %in% key[key$training == 0, "username"],]
dfOne <- df[df$Username %in% key[key$training == 1, "username"],]
Using merge()
dfZero <- merge(df, key[key$training == 0,], by.x = "Username", by.y = "username")
dfOne <- merge(df, key[key$training == 1,], by.x = "Username", by.y = "username")
Removing NA's using filter function on few columns of the data frame
If there are more than one column, use filter_at
library(dplyr)
df %>%
filter_at(vars(KeyPress, KPIndex, X, Y), any_vars(!is.na(.)))
Or with rowSums
from base R
nm1 <- c("KeyPress", "KPIndex", "X", "Y")
df[rowSums(!is.na(df[nm1]))!= 0,]
data
df <- structure(list(S.No = 1:3, MediaName = c("Dat", "New", "Dat"),
KeyPress = c(NA, NA, NA), KPIndex = c(1L, NA, 2L), Type = c("Fixation",
"Saccade", "Fixation"), Secs = c(18L, 33L, 23L), X = c(117L,
NA, 117L), Y = c(89L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))
Filter data.frame with all colums NA but keep when some are NA
We can use base R
teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4
Or using apply
and any
teste[apply(!is.na(teste), 1, any),]
which can be also used within filter
teste %>%
filter(rowSums(!is.na(.)) >0)
Or using c_across
from dplyr
, we can directly remove the rows with all
NA
library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4
NOTE: filter_all
is getting deprecated
Related Topics
How to Find the Polygon Nearest to a Point in R
Create Lagged Variable in Unbalanced Panel Data in R
Linear Regression and Storing Results in Data Frame
Excel Cell Coloring Using Xlsx
Filter One Selectinput Based on Selection from Another Selectinput
Using Multiple Ellipses Arguments in R
Calculate Rolling Correlation Using Rollapply
Earliest Date for Each Id in R
Arithmetic Mean on a Multidimensional Array on R and Matlab: Drastic Difference of Performances
Remove Plot Margins in Ggplot2
Package 'Stringi' Does Not Work After Updating to R3.2.1
How to Set Attributes for a Variable in R
Shiny Selectinput to Select All from Dropdown
Apply Over Matrix by Column - Any Way to Get Column Name
How to Rotate an Image R Raster
Formatter Argument in Scale_Continuous Throwing Errors in R 2.15