R: data.table count !NA per row
Try this one using Reduce
to chain together +
calls:
d[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))]
If speed is critical, you can eek out a touch more with Ananda's suggestion to hardcode the number of columns being assessed:
d[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))]
Benchmarking using Ananda's larger data.table d
from above:
fun1 <- function(indt) indt[, num_obs := rowSums(!is.na(indt))][]
fun3 <- function(indt) indt[, num_obs := Reduce(`+`, lapply(.SD,function(x) !is.na(x)))][]
fun4 <- function(indt) indt[, num_obs := 4 - Reduce("+", lapply(.SD, is.na))][]
library(microbenchmark)
microbenchmark(fun1(copy(d)), fun3(copy(d)), fun4(copy(d)), times=10L)
#Unit: milliseconds
# expr min lq mean median uq max neval
# fun1(copy(d)) 3.565866 3.639361 3.912554 3.703091 4.023724 4.596130 10
# fun3(copy(d)) 2.543878 2.611745 2.973861 2.664550 3.657239 4.011475 10
# fun4(copy(d)) 2.265786 2.293927 2.798597 2.345242 3.385437 4.128339 10
Counting the NA's in a part of a row in data.table
Using data.table
, you could do this:
df[, NonNA := sum(!is.na(questionA), !is.na(questionB), !is.na(questionC)), by = .(nr)]
A base solution:
df$nonNA <- rowSums(!is.na(df[,c("questionA", "questionB", "questionC")]))
Row mean and number of entries per row using data.table in R
I'd suggest replacing -9999
with NA
and then using na.rm = TRUE
for rowMeans
:
library(data.table)
temp <- data.table(replicate(4, rep("charVar", 640)), replicate(46, sample(c(0:100, -9999), 640, rep = TRUE)))
for (j in 5:50){set(temp, which(temp[[j]] == -9999), j, NA)}
temp[, .(Mean = rowMeans(.SD, na.rm = TRUE), Count = rowSums(!is.na(.SD))), .SDcols=c(5:50)]
# If you want to add the new columns to the existing data.table use:
# temp[, c("Mean", "Count") := .(rowMeans(.SD, na.rm = TRUE), rowSums(!is.na(.SD))), .SDcols=c(5:50)]
Count NAs per row in dataframe
You could add a new column to your data frame containing the number of NA
values per batch_id
:
df$na_count <- apply(df, 1, function(x) sum(is.na(x)))
How to count number of rows with NA on each column?
We can use the vectorized colSums
on a logical matrix (is.na(df1)
)
colSums(is.na(df1))
Or another option is sum
by looping
sapply(df1, function(x) sum(is.na(x)))
Or with dplyr
library(dplyr)
df1 %>%
summarise(across(everything(), ~ sum(is.na(.))))
How to simply count number of rows with NAs - R
tl;dr: row wise, you'll want sum(!complete.cases(DF))
, or, equivalently, sum(apply(DF, 1, anyNA))
There are a number of different ways to look at the number, proportion or position of NA
values in a data frame:
Most of these start with the logical data frame with TRUE
for every NA
, and FALSE
everywhere else. For the base dataset airquality
is.na(airquality)
There are 44 NA
values in this data set
sum(is.na(airquality))
# [1] 44
You can look at the total number of NA
values per row or column:
head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
# Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
You can use anyNA()
in place of is.na()
as well:
# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE TRUE TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42
# by column
head(apply(airquality, 2, anyNA))
# Ozone Solar.R Wind Temp Month Day
# TRUE TRUE FALSE FALSE FALSE FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2
complete.cases()
can be used, but only row-wise:
sum(!complete.cases(airquality))
# [1] 42
Count the number of NAs in multiple columns after grouping a dataframe in R
I propose two ways:
using dplyr:
df %>%
group_by(Region,ID) %>%
summarise_each(list(na_count = ~sum(is.na(.))))
or data.table:
library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]
Related Topics
Keep Same Order as in Data Files When Using Ggplot
Generating Names Iteratively in R for Storing Plots
Stacked Histograms Like in Flow Cytometry
Write a Data Frame to CSV File Without Column Header in R
How to Skip Error Checking at Rmarkdown Compiling
Names' Attribute Must Be the Same Length as the Vector
Align Axis Label on the Right with Ggplot2
Use Href Infobox as Actionbutton
Knitr: How to Use Child .Rnw Docs with (Relative) Figure Paths
How to Extract Data from a Rasterbrick
How to Control Ggplot's Plotting Area Proportions Instead of Fitting Them to Devices in R
How to Do a Data.Table Rolling Join
What Are the Caveats of Using Source Versus Parse & Eval
Add Na Value to Ggplot Legend for Continuous Data Map
How to Loop Through a Folder of CSV Files in R