R: How to Total the Number of Na in Each Col of Data.Frame

R: how to total the number of NA in each col of data.frame

You could try:

colSums(is.na(df))
#  V1 V2 V3 V4 V5 
#   2  4  2  4  4

data

set.seed(42)
df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))

How to count number of rows with NA on each column?

We can use the vectorized colSums on a logical matrix (is.na(df1))

colSums(is.na(df1))

Or another option is sum by looping

sapply(df1, function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
df1 %>%
    summarise(across(everything(), ~ sum(is.na(.))))

Count the number of NAs in multiple columns after grouping a dataframe in R

I propose two ways:

using dplyr:

df %>% 
  group_by(Region,ID) %>%
  summarise_each(list(na_count = ~sum(is.na(.))))

or data.table:

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]

Count NA in multiple columns in R

In the first case, there are multiple functions passed. We may either need to block it with {}

library(dplyr)
dt %>% 
    select(starts_with("V2QE38")) %>%
    {colSums(is.na(.))}
V2QE38A V2QE38B V2QE38C V2QE38D 
      0       0       0       0

or have another %>%

dt %>%
    select(starts_with("V2QE38")) %>%
    is.na %>%
    colSums

-output

V2QE38A V2QE38B V2QE38C V2QE38D 
      0       0       0       0

The issue is that colSums is executed first without evaluating the is.na

> dt %>% 
   select(starts_with("V2QE38")) %>% 
   colSums(.)
V2QE38A V2QE38B V2QE38C V2QE38D 
      6       1      12       0

which is the same as the OP's output with colSums(is.na(.))

Count number of NA's in a Row in Specified Columns R

df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')])) 

df
   first m_initial     last         address    phone state customer na_count
1    Bob         L   Turner 123 Turner Lane 410-3141  Iowa     <NA>        0
2   Will         P Williams 456 Williams Rd 491-2359  <NA>        Y        1
3 Amanda         C    Jones    789 Haggerty     <NA>  <NA>        Y        2
4   Lisa      <NA>    Evans            <NA>     <NA>  <NA>        N        3

Number of missing values in each column in R

If I'm not mistaken, sapply is not vectorized. Can use colSums and is.na directly

>>> colSums(is.na(titanic_train))

Count number of non-NA values for every column in a dataframe

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame':    100 obs. of  5 variables:
#>  $ V1: int  NA 1 NA NA 1 NA 1 1 1 NA ...
#>  $ V2: int  NA NA NA 1 NA 1 0 1 0 NA ...
#>  $ V3: int  1 1 0 1 1 NA NA 1 NA NA ...
#>  $ V4: int  NA 0 NA 0 0 NA 1 1 NA NA ...
#>  $ V5: int  NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5 
#> 69 55 62 60 70

R: How to Total the Number of Na in Each Col of Data.Frame