R: Data.Table Count !Na Per Row

R: data.table count !NA per row

Try this one using Reduce to chain together + calls:

d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]

If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:

d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]

Benchmarking using Ananda's larger data.table d from above:

fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]

library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)

#Unit: milliseconds
#          expr      min       lq     mean   median       uq      max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130    10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475    10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339    10

Counting the NA's in a part of a row in data.table

Using data.table, you could do this:

df[, NonNA := sum(!is.na(questionA), !is.na(questionB), !is.na(questionC)), by = .(nr)]

A base solution:

df$nonNA <- rowSums(!is.na(df[,c("questionA", "questionB", "questionC")]))

Row mean and number of entries per row using data.table in R

I'd suggest replacing -9999 with NA and then using na.rm = TRUE for rowMeans:

library(data.table)
temp <- data.table(replicate(4, rep("charVar", 640)), replicate(46, sample(c(0:100, -9999), 640, rep = TRUE)))

for (j in 5:50){set(temp, which(temp[[j]] == -9999), j, NA)}
temp[, .(Mean = rowMeans(.SD, na.rm = TRUE), Count = rowSums(!is.na(.SD))), .SDcols=c(5:50)]

# If you want to add the new columns to the existing data.table use:
# temp[, c("Mean", "Count") := .(rowMeans(.SD, na.rm = TRUE), rowSums(!is.na(.SD))), .SDcols=c(5:50)]

Count NAs per row in dataframe

You could add a new column to your data frame containing the number of NA values per batch_id:

df$na_count <- apply(df, 1, function(x) sum(is.na(x)))

How to count number of rows with NA on each column?

We can use the vectorized colSums on a logical matrix (is.na(df1))

colSums(is.na(df1))

Or another option is sum by looping

sapply(df1, function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
df1 %>%
    summarise(across(everything(), ~ sum(is.na(.))))

How to simply count number of rows with NAs - R

tl;dr: row wise, you'll want sum(!complete.cases(DF)), or, equivalently, sum(apply(DF, 1, anyNA))

There are a number of different ways to look at the number, proportion or position of NA values in a data frame:

Most of these start with the logical data frame with TRUE for every NA, and FALSE everywhere else. For the base dataset airquality

is.na(airquality)

There are 44 NA values in this data set

sum(is.na(airquality))
# [1] 44

You can look at the total number of NA values per row or column:

head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
#   Ozone Solar.R    Wind    Temp   Month     Day 
 37       7       0       0       0       0

You can use anyNA() in place of is.na() as well:

# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE  TRUE  TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42

# by column
head(apply(airquality, 2, anyNA))
#   Ozone Solar.R    Wind    Temp   Month     Day 
#    TRUE    TRUE   FALSE   FALSE   FALSE   FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2

complete.cases() can be used, but only row-wise:

sum(!complete.cases(airquality))
# [1] 42

Count the number of NAs in multiple columns after grouping a dataframe in R

I propose two ways:

using dplyr:

df %>% 
  group_by(Region,ID) %>%
  summarise_each(list(na_count = ~sum(is.na(.))))

or data.table:

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]

R: Data.Table Count !Na Per Row