R: How to Total the Number of Na in Each Col of Data.Frame

R: how to total the number of NA in each col of data.frame

You could try:

colSums(is.na(df))
# V1 V2 V3 V4 V5
# 2 4 2 4 4

data

set.seed(42)
df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))

How to count number of rows with NA on each column?

We can use the vectorized colSums on a logical matrix (is.na(df1))

colSums(is.na(df1))

Or another option is sum by looping

sapply(df1, function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
df1 %>%
summarise(across(everything(), ~ sum(is.na(.))))

Count the number of NAs in multiple columns after grouping a dataframe in R

I propose two ways:

using dplyr:

df %>% 
group_by(Region,ID) %>%
summarise_each(list(na_count = ~sum(is.na(.))))

or data.table:

library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]

Count NA in multiple columns in R

In the first case, there are multiple functions passed. We may either need to block it with {}

library(dplyr)
dt %>%
select(starts_with("V2QE38")) %>%
{colSums(is.na(.))}
V2QE38A V2QE38B V2QE38C V2QE38D
0 0 0 0

or have another %>%

dt %>%
select(starts_with("V2QE38")) %>%
is.na %>%
colSums

-output

V2QE38A V2QE38B V2QE38C V2QE38D 
0 0 0 0

The issue is that colSums is executed first without evaluating the is.na

> dt %>% 
select(starts_with("V2QE38")) %>%
colSums(.)
V2QE38A V2QE38B V2QE38C V2QE38D
6 1 12 0

which is the same as the OP's output with colSums(is.na(.))

Count number of NA's in a Row in Specified Columns R

df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')])) 

df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3

Number of missing values in each column in R

If I'm not mistaken, sapply is not vectorized. Can use colSums and is.na directly

>>> colSums(is.na(titanic_train))

Count number of non-NA values for every column in a dataframe

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70


Related Topics



Leave a reply



Submit