Omit Rows Containing Specific Column of Na

Omit rows containing specific column of NA

You could use the complete.cases function and put it into a function thusly:

DF <- data.frame(x = c(1, 2, 3), y = c(0, 10, NA), z=c(NA, 33, 22))

completeFun <- function(data, desiredCols) {
completeVec <- complete.cases(data[, desiredCols])
return(data[completeVec, ])
}

completeFun(DF, "y")
# x y z
# 1 1 0 NA
# 2 2 10 33

completeFun(DF, c("y", "z"))
# x y z
# 2 2 10 33

EDIT: Only return rows with no NAs

If you want to eliminate all rows with at least one NA in any column, just use the complete.cases function straight up:

DF[complete.cases(DF), ]
# x y z
# 2 2 10 33

Or if completeFun is already ingrained in your workflow ;)

completeFun(DF, names(DF))

How to remove row if it has a NA value in one certain column

The easiest solution is to use is.na():

df[!is.na(df$B), ]

which gives you:

   A B  C
1 NA 2 NA
2 1 2 3
4 1 2 3

Remove entire Row if Column == NA in R

Here's a way to filter our rows with an NA in the name field:

library(dplyr)
df %>% filter(!is.na(name))

#> name GA SV
#> 1 CAREY.PRICE 3 2
#> 2 JOHN.SMITH 2 NA

Remove rows containing NA from the column with the least number of NAs

We could first find the name of the column with minimum number of NAs and then remove NA rows from that column.

col <- names(which.min(colSums(is.na(df[-1]))))
df[!is.na(df[col]), ]


# Date grpA grpB
#3 2007-11-09 1.66 NA
#4 2007-11-12 1.64 NA
#5 2007-11-13 1.61 1.28
#6 2007-11-14 1.60 1.30
#7 2007-11-15 1.57 1.27
#8 2007-11-16 1.56 1.25
#9 2007-11-19 1.55 1.25
#10 2007-11-20 1.55 1.25
#11 2007-11-21 1.52 1.22
#12 2007-11-22 1.50 1.21
#13 2007-11-23 1.51 1.21
#14 2007-11-26 1.52 1.25
#15 2007-11-27 1.50 1.25
#16 2007-11-28 1.50 1.23
#17 2007-11-29 1.52 1.24
#18 2007-11-30 1.56 1.25
#19 2007-12-03 1.56 1.22
#20 2007-12-04 1.56 1.23

which can be done in one-liner as well without creating additional variable

df[!is.na(df[names(which.min(colSums(is.na(df[-1]))))]), ]

Using the same logic a dplyr approach could be using filter_at

library(dplyr)

df %>%
filter_at(df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names, ~!is.na(.))

Or using it with tidyr::drop_na

tidyr::drop_na(df, df %>%
summarise_at(-1, ~sum(is.na(.))) %>%
which.min %>% names)

How to drop rows of Pandas DataFrame whose value in a certain column is NaN

Don't drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]

Remove rows which have all NAs in certain columns

This a one-liner to remove the rows with NA in all columns between 5 and 9. By combining rowSums() with is.na() it is easy to check whether all entries in these 5 columns are NA:

x <- x[rowSums(is.na(x[,5:9]))!=5,]

How to omit rows with NA in only two columns in R?

df[!with(df,is.na(x)& is.na(y)),]
# x y z
#1 1 4 8
#2 2 5 9
#4 3 6 11
#5 NA 7 NA

I did benchmarked on a slightly bigger dataset. Here are the results:

set.seed(237)
df <- data.frame(x=sample(c(NA,1:20), 1e6, replace=T), y= sample(c(NA, 1:10), 1e6, replace=T), z= sample(c(NA, 5:15), 1e6,replace=T))

f1 <- function() df[!with(df,is.na(x)& is.na(y)),]
f2 <- function() df[rowSums(is.na(df[c("x", "y")])) != 2, ]
f3 <- function() df[ apply( df, 1, function(x) sum(is.na(x))>1 ), ]

library(microbenchmark)

microbenchmark(f1(), f2(), f3(), unit="relative")
Unit: relative
#expr min lq median uq max neval
# f1() 1.000000 1.000000 1.000000 1.000000 1.000000 100
# f2() 1.044812 1.068189 1.138323 1.129611 0.856396 100
# f3() 26.205272 25.848441 24.357665 21.799930 22.881378 100

Deleting rows corresponding to particular string, but keep NA values

We can create a condition with is.na to return those elements that are NA as well with |

 subset(df, Col1 != 'string' | is.na(Col1))


Related Topics



Leave a reply



Submit