Omit rows containing specific column of NA
You could use the complete.cases
function and put it into a function thusly:
DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))
completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}
completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33
completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33
EDIT: Only return rows with no NA
s
If you want to eliminate all rows with at least one NA
in any column, just use the complete.cases
function straight up:
DF[complete.cases(DF), ]
# x y z
# 2 2 10 33
Or if completeFun
is already ingrained in your workflow ;)
completeFun(DF, names(DF))
How to remove row if it has a NA value in one certain column
The easiest solution is to use is.na()
:
df[!is.na(df$B), ]
which gives you:
A B C
1 NA 2 NA
2 1 2 3
4 1 2 3
Remove entire Row if Column == NA in R
Here's a way to filter our rows with an NA in the name
field:
library(dplyr)
df %>% filter(!is.na(name))
#> name GA SV
#> 1 CAREY.PRICE 3 2
#> 2 JOHN.SMITH 2 NA
Remove rows containing NA from the column with the least number of NAs
We could first find the name of the column with minimum number of NA
s and then remove NA
rows from that column.
col <- names(which.min(colSums(is.na(df[-1]))))
df[!is.na(df[col]), ]
# Date grpA grpB
#3 2007-11-09 1.66 NA
#4 2007-11-12 1.64 NA
#5 2007-11-13 1.61 1.28
#6 2007-11-14 1.60 1.30
#7 2007-11-15 1.57 1.27
#8 2007-11-16 1.56 1.25
#9 2007-11-19 1.55 1.25
#10 2007-11-20 1.55 1.25
#11 2007-11-21 1.52 1.22
#12 2007-11-22 1.50 1.21
#13 2007-11-23 1.51 1.21
#14 2007-11-26 1.52 1.25
#15 2007-11-27 1.50 1.25
#16 2007-11-28 1.50 1.23
#17 2007-11-29 1.52 1.24
#18 2007-11-30 1.56 1.25
#19 2007-12-03 1.56 1.22
#20 2007-12-04 1.56 1.23
which can be done in one-liner as well without creating additional variable
df[!is.na(df[names(which.min(colSums(is.na(df[-1]))))]), ]
Using the same logic a dplyr
approach could be using filter_at
library(dplyr)
df %>%
filter_at(df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names, ~!is.na(.))
Or using it with tidyr::drop_na
tidyr::drop_na(df, df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names)
How to drop rows of Pandas DataFrame whose value in a certain column is NaN
Don't drop, just take the rows where EPS is not NA:
df = df[df['EPS'].notna()]
Remove rows which have all NAs in certain columns
This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums()
with is.na()
it is easy to check whether all entries in these 5 columns are NA
:
x <- x[rowSums(is.na(x[,5:9]))!=5,]
How to omit rows with NA in only two columns in R?
df[!with(df,is.na(x)& is.na(y)),]
# x y z
#1 1 4 8
#2 2 5 9
#4 3 6 11
#5 NA 7 NA
I did benchmarked on a slightly bigger dataset. Here are the results:
set.seed(237)
df <- data.frame(x=sample(c(NA,1:20), 1e6, replace=T), y= sample(c(NA, 1:10), 1e6, replace=T), z= sample(c(NA, 5:15), 1e6,replace=T))
f1 <- function() df[!with(df,is.na(x)& is.na(y)),]
f2 <- function() df[rowSums(is.na(df[c("x", "y")])) != 2, ]
f3 <- function() df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ]
library(microbenchmark)
microbenchmark(f1(), f2(), f3(), unit="relative")
Unit: relative
#expr min lq median uq max neval
# f1() 1.000000 1.000000 1.000000 1.000000 1.000000 100
# f2() 1.044812 1.068189 1.138323 1.129611 0.856396 100
# f3() 26.205272 25.848441 24.357665 21.799930 22.881378 100
Deleting rows corresponding to particular string, but keep NA values
We can create a condition with is.na
to return those elements that are NA
as well with |
subset(df, Col1 != 'string' | is.na(Col1))
Related Topics
Extract Month and Year from a Zoo::Yearmon Object
Plot With Conditional Colors Based on Values in R
Splitting a Continuous Variable into Equal Sized Groups
Geom_Rect and Alpha - Does This Work With Hard Coded Values
What Is Meaning of First Tilde in Purrr::Map
R Ifelse to Replace Values in a Column
Displaying Text Below the Plot Generated by Ggplot2
Subset a Dataframe Between 2 Dates
Finding Rows Containing a Value (Or Values) in Any Column
Increment by 1 For Every Change in Column
Remove an Entire Column from a Data.Frame in R
Fitting Several Regression Models With Dplyr
Remove Na Values from a Vector
All Levels of a Factor in a Model Matrix in R
How to Put Labels Over Geom_Bar For Each Bar in R With Ggplot2
Using Data.Table Package Inside My Own Package