R: Data.Table Count !Na Per Row

R: data.table count !NA per row

Try this one using Reduce to chain together + calls:

d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]

If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:

d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]

Benchmarking using Ananda's larger data.table d from above:

fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]

library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)

#Unit: milliseconds
# expr min lq mean median uq max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130 10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475 10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339 10

Counting the NA's in a part of a row in data.table

Using data.table, you could do this:

df[, NonNA := sum(!is.na(questionA), !is.na(questionB), !is.na(questionC)), by = .(nr)]

A base solution:

df$nonNA <- rowSums(!is.na(df[,c("questionA", "questionB", "questionC")]))

Row mean and number of entries per row using data.table in R

I'd suggest replacing -9999 with NA and then using na.rm = TRUE for rowMeans:

library(data.table)
temp <- data.table(replicate(4, rep("charVar", 640)), replicate(46, sample(c(0:100, -9999), 640, rep = TRUE)))

for (j in 5:50){set(temp, which(temp[[j]] == -9999), j, NA)}
temp[, .(Mean = rowMeans(.SD, na.rm = TRUE), Count = rowSums(!is.na(.SD))), .SDcols=c(5:50)]

# If you want to add the new columns to the existing data.table use:
# temp[, c("Mean", "Count") := .(rowMeans(.SD, na.rm = TRUE), rowSums(!is.na(.SD))), .SDcols=c(5:50)]

Count NAs per row in dataframe

You could add a new column to your data frame containing the number of NA values per batch_id:

df$na_count <- apply(df, 1, function(x) sum(is.na(x)))

How to count number of rows with NA on each column?

We can use the vectorized colSums on a logical matrix (is.na(df1))

colSums(is.na(df1))

Or another option is sum by looping

sapply(df1, function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
df1 %>%
summarise(across(everything(), ~ sum(is.na(.))))

How to simply count number of rows with NAs - R

tl;dr: row wise, you'll want sum(!complete.cases(DF)), or, equivalently, sum(apply(DF, 1, anyNA))

There are a number of different ways to look at the number, proportion or position of NA values in a data frame:

Most of these start with the logical data frame with TRUE for every NA, and FALSE everywhere else. For the base dataset airquality

is.na(airquality)

There are 44 NA values in this data set

sum(is.na(airquality))
# [1] 44

You can look at the total number of NA values per row or column:

head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
# Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0

You can use anyNA() in place of is.na() as well:

# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE TRUE TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42

# by column
head(apply(airquality, 2, anyNA))
# Ozone Solar.R Wind Temp Month Day
# TRUE TRUE FALSE FALSE FALSE FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2

complete.cases() can be used, but only row-wise:

sum(!complete.cases(airquality))
# [1] 42

Count the number of NAs in multiple columns after grouping a dataframe in R

I propose two ways:

using dplyr:

df %>% 
group_by(Region,ID) %>%
summarise_each(list(na_count = ~sum(is.na(.))))

or data.table:

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]


Related Topics



Leave a reply



Submit