R Count Na by Group

Count total missing values by group?

data.table solution

library(data.table)
setDT(df1)

df1[, .(sumNA = sum(is.na(.SD))), by = Z]

# Z sumNA
# 1: A 559
# 2: C 661
# 3: E 596
# 4: B 597
# 5: D 560

dplyr solution using rowSums(.[-1]), i.e. row-sums for all columns except the first.

library(dplyr)

df1 %>%
group_by(Z) %>%
summarise_all(~sum(is.na(.))) %>%
transmute(Z, sumNA = rowSums(.[-1]))

# # A tibble: 5 x 2
# Z sumNA
# <fct> <dbl>
# 1 A 559
# 2 B 597
# 3 C 661
# 4 D 560
# 5 E 596

Count non-NA values by group

You can use this

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2

Count non-`NA` of several columns by group using summarize and across from dplyr

I hope this is what you are looking for:

library(dplyr)

d %>%
group_by(ID) %>%
summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))

# A tibble: 3 x 4
ID `non-Col1` `non-Col2` `non-Col3`
<dbl> <int> <int> <int>
1 1 3 2 3
2 2 2 0 2
3 3 1 1 0

Or if you would like to select columns by their shared string you can use this:

d %>%
group_by(ID) %>%
summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))

Group by count NAs as zeros

We can get the sum of a logical vector created with is.na to get the count as TRUE => 1 and FALSE => 0 so the sum returns the count of non-NA elements

library(dplyr)
df %>%
group_by(group) %>%
summarise(n = sum(!is.na(id)))
# A tibble: 5 x 2
# group n
# * <chr> <int>
#1 A 2
#2 B 1
#3 C 0
#4 D 1
#5 E 0

Or use length after subsetting

df %>%
group_by(group) %>%
summarise(n = length(id[!is.na(id)]))

n() returns the total number of rows including the missing values

Count of NAs colwise by a group

We can use aggregate

aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
dat %>%
group_by(grp) %>%
summarise_each(funs(sum(is.na(.)))

Or using data.table

library(data.table)
setDT(dat)[, lapply(.SD, function(x) sum(is.na(x))), grp]

Or as @David Arenburg mentioned in the comments, rowsum is another option where we can do the group by operation while summing. We used + to coerce the logical matrix (is.na(dat)) to binary as the function will not work with logical class.

 rowsum(+(is.na(dat)), dat$grp)

Taking a count() after group_by() for non-missing values

count is not the right function here. The first argument to count is a dataframe or tibble specifically. However, what you are passing is a vector hence you get the error. Also count summarises the dataframe so that you have only one row per group. See for example,

library(dplyr)

df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
count(country)

# country n
# <fct> <int>
#1 JPN 2
#2 USA 2

If you want to add a new column without summarising, use add_count instead

df %>% 
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
add_count(country)

# id x country mean_x n
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 2
#4 4 NA JPN 3.5 2

However, both of this function don't do what you need. To count non-NA values per group, you need

df %>% 
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count = length(na.omit(x)))
#OR
#count = sum(!is.na(x)))#as @Humpelstielzchen mentioned


# id x country mean_x count
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 1
#4 4 NA JPN 3.5 1

row wise NA count across some columns - grouped by id


library(tidyverse)

threshold = 10

df %>% group_by(id) %>%
mutate(evidence = ifelse(n()*5 - sum(na_count) >= threshold, "yes", "no"))

The 5 comes from the number of columns you have, q1:q5.

R count how many elements in a group occur in a dataframe


df %>% filter(Class != "f") %>% 
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(Class)) %>%
group_by(`# of occurrences`) %>%
summarise(count = length(Subject),
count.from.subject = paste(Subject, collapse = ","))


Edit:

You can use also use mutate with group_by instead of summarise, which will append the same value to each element in the group:
(with complete you can extend the missing values)

df %>% 
mutate(Class = na_if(Class, "f")) %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(na.omit(Class)),
count.class = na_if(paste(sort(unique(na.omit(Class))), collapse = ","), "")) %>%
group_by(`# of occurrences`) %>%
mutate(count = n()) %>% ungroup() %>%
complete(`# of occurrences` = 0:5, fill = list(count = 0)) %>%
transmute(`# of occurrences`, count, count.from.subject = Subject, count.class)


Related Topics



Leave a reply



Submit