Count Nas Per Row in Dataframe

Count NAs per row in dataframe

You could add a new column to your data frame containing the number of NA values per batch_id:

df$na_count <- apply(df, 1, function(x) sum(is.na(x)))

How to count number of NAs per row, with conditions

Both can be done with rowSums in the second case subset df to the desired columns.

rowSums(is.na(df))
# [1] 3 1 3 2 2 0 2 0 0 1 2 2 2 1 1 2 0 1 2 1 0

rowSums(is.na(df[2:5]))
# [1] 2 1 1 1 2 0 1 0 0 1 1 1 1 1 0 2 0 0 1 0 0

How to simply count number of rows with NAs - R

tl;dr: row wise, you'll want sum(!complete.cases(DF)), or, equivalently, sum(apply(DF, 1, anyNA))

There are a number of different ways to look at the number, proportion or position of NA values in a data frame:

Most of these start with the logical data frame with TRUE for every NA, and FALSE everywhere else. For the base dataset airquality

is.na(airquality)

There are 44 NA values in this data set

sum(is.na(airquality))
# [1] 44

You can look at the total number of NA values per row or column:

head(rowSums(is.na(airquality)))
# [1] 0 0 0 0 2 1
colSums(is.na(airquality))
# Ozone Solar.R Wind Temp Month Day
37 7 0 0 0 0

You can use anyNA() in place of is.na() as well:

# by row
head(apply(airquality, 1, anyNA))
# [1] FALSE FALSE FALSE FALSE TRUE TRUE
sum(apply(airquality, 1, anyNA))
# [1] 42


# by column
head(apply(airquality, 2, anyNA))
# Ozone Solar.R Wind Temp Month Day
# TRUE TRUE FALSE FALSE FALSE FALSE
sum(apply(airquality, 2, anyNA))
# [1] 2

complete.cases() can be used, but only row-wise:

sum(!complete.cases(airquality))
# [1] 42

Count number of NA's in a Row in Specified Columns R

df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')])) 

df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3

Python/Pandas: counting the number of missing/NaN in each row

You could first find if element is NaN or not by isnull() and then take row-wise sum(axis=1)

In [195]: df.isnull().sum(axis=1)
Out[195]:
0 0
1 0
2 0
3 3
4 0
5 0
dtype: int64

And, if you want the output as list, you can

In [196]: df.isnull().sum(axis=1).tolist()
Out[196]: [0, 0, 0, 3, 0, 0]

Or use count like

In [130]: df.shape[1] - df.count(axis=1)
Out[130]:
0 0
1 0
2 0
3 3
4 0
5 0
dtype: int64

Count NaN per row with Pandas

IIUC, this should fulfill your needs.

nasum=df['First_Name'].isnull().sum()
df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').replace(np.nan,nasum)

or, as suggested by ALollz, below code will also provide the same result

df['countNames'] = df.groupby('First_Name')['First_Name'].transform('count').fillna(nasum)

Input

       First_Name   Favorite_Color
0 Jared Blue
1 Lily Blue
2 Sarah Pink
3 Bill Red
4 Bill Yellow
5 Alfred Orange
6 None Red
7 None Pink

Output

     First_Name     Favorite_Color  countNames
0 Jared Blue 1.0
1 Lily Blue 1.0
2 Sarah Pink 1.0
3 Bill Red 2.0
4 Bill Yellow 2.0
5 Alfred Orange 1.0
6 None Red 2.0
7 None Pink 2.0

Count missing values with rowwise and add number of missing values

You don't need rowwise. Just comment that line and your code works.

This works:

df %>% 
select(var1, var2) %>%
mutate(na = rowSums(is.na(.)))

Count NA in given columns by rows

Another option

NA.counts <- sapply(split(seq(ncol(test)), ceiling(seq(ncol(test))/2))
, function(x) rowSums(is.na(test[, x])))

If you want to use tidyverse to add columns you can do

library(tidyverse)
test %>%
cbind(NA.counts = map(seq(ncol(test)) %>% split(ceiling(./2))
, ~rowSums(is.na(test[, .]))))


# BIEZ_01 BIEZ_02 BIEZ_03 BIEZ_04 BIEZ_05 BIEZ_06 NA.counts.1 NA.counts.2 NA.counts.3
# 1 59000 5060 NA 22100 NA 4400 0 1 1
# 2 61462 55401 60783 59885 59209 6109 0 0 0
# 3 NA 33000 20000 15000 15000 NA 1 0 1
# 4 33000 33000 20000 15000 15000 500 0 0 0
# 5 30840 30840 NA 20840 20840 10840 0 1 0
# 6 36612 28884 19248 10000 NA 10000 0 0 1

As @Moody_Mudskipper points out, cbind isn't necessary if you want to modify the dataframe. You can add the columns with

test[paste0("SUM",seq(ncol(test)/2))] <- map(seq(ncol(test)) %>% split(ceiling(./2)), 
~rowSums(is.na(test[.])))

R count number of NA values for each row of a CSV

try this:

result <- data.frame("rowmname"=rownames(df), "missing"=rowSums(is.na(df)))
result


Related Topics



Leave a reply



Submit