Count NAs per row in dataframe
You could add a new column to your data frame containing the number of NA
values per batch_id
:
df$na_count <- apply(df, 1, function(x) sum(is.na(x)))
How to count number of NAs per row, with conditions
Both can be done with rowSums
in the second case subset df to the desired columns.
rowSums(is.na(df))
# [1] 3 1 3 2 2 0 2 0 0 1 2 2 2 1 1 2 0 1 2 1 0
rowSums(is.na(df[2:5]))
# [1] 2 1 1 1 2 0 1 0 0 1 1 1 1 1 0 2 0 0 1 0 0
How to simply count number of rows with NAs - R
tl;dr: row wise, you'll want sum(!complete.cases(DF))
, or, equivalently, sum(apply(DF, 1, anyNA))
There are a number of different ways to look at the number, proportion or position of NA
values in a data frame:
Most of these start with the logical data frame with TRUE
for every NA
, and FALSE
everywhere else. For the base dataset airquality
is.na(airquality)
There are 44 NA
values in this data set
sum(is.na(airquality))
# [1] 44
You can look at the total number of NA
values per row or column:
head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
# Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0
You can use anyNA()
in place of is.na()
as well:
# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE TRUE TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42
# by column
head(apply(airquality, 2, anyNA))
# Ozone Solar.R Wind Temp Month Day
# TRUE TRUE FALSE FALSE FALSE FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2
complete.cases()
can be used, but only row-wise:
sum(!complete.cases(airquality))
# [1] 42
Count number of NA's in a Row in Specified Columns R
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')]))
df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
Python/Pandas: counting the number of missing/NaN in each row
You could first find if element is NaN
or not by isnull()
and then take row-wise sum(axis=1)
In [195]: df.isnull().sum(axis=1)
Out[195]:
0 0
1 0
2 0
3 3
4 0
5 0
dtype: int64
And, if you want the output as list, you can
In [196]: df.isnull().sum(axis=1).tolist()
Out[196]: [0, 0, 0, 3, 0, 0]
Or use count
like
In [130]: df.shape[1] - df.count(axis=1)
Out[130]:
0 0
1 0
2 0
3 3
4 0
5 0
dtype: int64
Count NaN per row with Pandas
IIUC, this should fulfill your needs.
nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)
or, as suggested by ALollz, below code will also provide the same result
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)
Input
First_Name Favorite_Color
0 Jared Blue
1 Lily Blue
2 Sarah Pink
3 Bill Red
4 Bill Yellow
5 Alfred Orange
6 None Red
7 None Pink
Output
First_Name Favorite_Color countNames
0 Jared Blue 1.0
1 Lily Blue 1.0
2 Sarah Pink 1.0
3 Bill Red 2.0
4 Bill Yellow 2.0
5 Alfred Orange 1.0
6 None Red 2.0
7 None Pink 2.0
Count missing values with rowwise and add number of missing values
You don't need rowwise
. Just comment that line and your code works.
This works:
df %>%
select(var1, var2) %>%
mutate(na = rowSums(is.na(.)))
Count NA in given columns by rows
Another option
NA.counts <- sapply(split(seq(ncol(test)), ceiling(seq(ncol(test))/2))
, function(x) rowSums(is.na(test[, x])))
If you want to use tidyverse
to add columns you can do
library(tidyverse)
test %>%
cbind(NA.counts = map(seq(ncol(test)) %>% split(ceiling(./2))
, ~rowSums(is.na(test[, .]))))
# BIEZ_01 BIEZ_02 BIEZ_03 BIEZ_04 BIEZ_05 BIEZ_06 NA.counts.1 NA.counts.2 NA.counts.3
# 1 59000 5060 NA 22100 NA 4400 0 1 1
# 2 61462 55401 60783 59885 59209 6109 0 0 0
# 3 NA 33000 20000 15000 15000 NA 1 0 1
# 4 33000 33000 20000 15000 15000 500 0 0 0
# 5 30840 30840 NA 20840 20840 10840 0 1 0
# 6 36612 28884 19248 10000 NA 10000 0 0 1
As @Moody_Mudskipper points out, cbind
isn't necessary if you want to modify the dataframe. You can add the columns with
test[paste0("SUM",seq(ncol(test)/2))] <- map(seq(ncol(test)) %>% split(ceiling(./2)),
~rowSums(is.na(test[.])))
R count number of NA values for each row of a CSV
try this:
result <- data.frame("rowmname"=rownames(df), "missing"=rowSums(is.na(df)))
result
Related Topics
Clang-7: Error: Linker Command Failed With Exit Code 1 For Macos Big Sur
Read All Worksheets in an Excel Workbook into an R List With Data.Frames
Summarizing Multiple Columns With Data.Table
Manually Setting Group Colors For Ggplot2
Concatenate Strings by Group With Dplyr
How to Change Multiple Date Formats in Same Column
Fastest Way to Find Second (Third...) Highest/Lowest Value in Vector or Column
A Comprehensive Survey of the Types of Things in R; 'Mode' and 'Class' and 'Typeof' Are Insufficient
Reasons For Using the Set.Seed Function
How Can One Work Fully Generically in Data.Table in R With Column Names in Variables
Limit Ggplot2 Axes Without Removing Data (Outside Limits): Zoom
Consistent Width For Geom_Bar in the Event of Missing Data
Select Equivalent Rows [A-B & B-A]
Workflow For Statistical Analysis and Report Writing