R: how to total the number of NA in each col of data.frame
You could try:
colSums(is.na(df))
# V1 V2 V3 V4 V5
# 2 4 2 4 4
data
set.seed(42)
df <- as.data.frame(matrix(sample(c(NA,0:4), 5*20,replace=TRUE), ncol=5))
How to count number of rows with NA on each column?
We can use the vectorized colSums
on a logical matrix (is.na(df1)
)
colSums(is.na(df1))
Or another option is sum
by looping
sapply(df1, function(x) sum(is.na(x)))
Or with dplyr
library(dplyr)
df1 %>%
summarise(across(everything(), ~ sum(is.na(.))))
Count the number of NAs in multiple columns after grouping a dataframe in R
I propose two ways:
using dplyr:
df %>%
group_by(Region,ID) %>%
summarise_each(list(na_count = ~sum(is.na(.))))
or data.table:
library(data.table)
setDT(df)[, lapply(.SD, function(x) sum(is.na(x))), by = .(Region, ID)]
Count NA in multiple columns in R
In the first case, there are multiple functions passed. We may either need to block it with {}
library(dplyr)
dt %>%
select(starts_with("V2QE38")) %>%
{colSums(is.na(.))}
V2QE38A V2QE38B V2QE38C V2QE38D
0 0 0 0
or have another %>%
dt %>%
select(starts_with("V2QE38")) %>%
is.na %>%
colSums
-output
V2QE38A V2QE38B V2QE38C V2QE38D
0 0 0 0
The issue is that colSums
is executed first without evaluating the is.na
> dt %>%
select(starts_with("V2QE38")) %>%
colSums(.)
V2QE38A V2QE38B V2QE38C V2QE38D
6 1 12 0
which is the same as the OP's output with colSums(is.na(.))
Count number of NA's in a Row in Specified Columns R
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')]))
df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
Number of missing values in each column in R
If I'm not mistaken, sapply
is not vectorized. Can use colSums
and is.na
directly
>>> colSums(is.na(titanic_train))
Count number of non-NA values for every column in a dataframe
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
Related Topics
Is There a More Efficient Way to Replace Null with Na in a List
Creating a Facet_Wrap Plot with Ggplot2 with Different Annotations in Each Plot
How to Change the Na Color from Gray to White in a Ggplot Choropleth Map
R: Determine If a Script Is Running in Windows or Linux
Filter One Selectinput Based on Selection from Another Selectinput
R - Run Source() in Background
Read Fasta into a Dataframe and Extract Subsequences of Fasta File
Count Consecutive Numbers in a Vector
Copy Upper Triangle to Lower Triangle for Several Matrices in a List
Plotting Cumulative Counts in Ggplot2
R Fails After Installing Gtk and Rgtk2
Initialize an Empty Tibble with Column Names and 0 Rows
Note in R Cran Check: No Repository Set, So Cyclic Dependency Check Skipped
Arrange Plots in a Layout Which Cannot Be Achieved by 'Par(Mfrow ='
How to Generate Bin Frequency Table in R