R Count Na by Group

Count total missing values by group?

data.table solution

library(data.table)
setDT(df1)

df1[, .(sumNA = sum(is.na(.SD))), by = Z]

#    Z sumNA
# 1: A   559
# 2: C   661
# 3: E   596
# 4: B   597
# 5: D   560

dplyr solution using rowSums(.[-1]), i.e. row-sums for all columns except the first.

library(dplyr)

df1 %>% 
  group_by(Z) %>% 
  summarise_all(~sum(is.na(.))) %>% 
  transmute(Z, sumNA = rowSums(.[-1]))

# # A tibble: 5 x 2
#   Z     sumNA
#   <fct> <dbl>
# 1 A       559
# 2 B       597
# 3 C       661
# 4 D       560
# 5 E       596

Count non-NA values by group

You can use this

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
   col_1 non_na_count
  <fctr>        <int>
1      A            1
2      B            2

Count non-`NA` of several columns by group using summarize and across from dplyr

I hope this is what you are looking for:

library(dplyr)

d %>%
  group_by(ID) %>%
  summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))

# A tibble: 3 x 4
     ID `non-Col1` `non-Col2` `non-Col3`
  <dbl>      <int>      <int>      <int>
1     1          3          2          3
2     2          2          0          2
3     3          1          1          0

Or if you would like to select columns by their shared string you can use this:

d %>%
  group_by(ID) %>%
  summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))

Group by count NAs as zeros

We can get the sum of a logical vector created with is.na to get the count as TRUE => 1 and FALSE => 0 so the sum returns the count of non-NA elements

library(dplyr)
df %>% 
   group_by(group) %>% 
   summarise(n = sum(!is.na(id)))
# A tibble: 5 x 2
#    group     n
# * <chr> <int>
#1 A         2
#2 B         1
#3 C         0
#4 D         1
#5 E         0

Or use length after subsetting

df %>%
   group_by(group) %>% 
   summarise(n = length(id[!is.na(id)]))

n() returns the total number of rows including the missing values

Count of NAs colwise by a group

We can use aggregate

aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
dat %>%
    group_by(grp) %>%
    summarise_each(funs(sum(is.na(.)))

Or using data.table

library(data.table)
setDT(dat)[, lapply(.SD,  function(x) sum(is.na(x))), grp]

Or as @David Arenburg mentioned in the comments, rowsum is another option where we can do the group by operation while summing. We used + to coerce the logical matrix (is.na(dat)) to binary as the function will not work with logical class.

 rowsum(+(is.na(dat)), dat$grp)

Taking a count() after group_by() for non-missing values

count is not the right function here. The first argument to count is a dataframe or tibble specifically. However, what you are passing is a vector hence you get the error. Also count summarises the dataframe so that you have only one row per group. See for example,

library(dplyr)

df %>% 
  group_by(country) %>% 
  mutate(mean_x = mean(x, na.rm = TRUE)) %>%
  count(country)

#  country     n
#  <fct>   <int>
#1 JPN         2
#2 USA         2

If you want to add a new column without summarising, use add_count instead

df %>% 
  group_by(country) %>% 
  mutate(mean_x = mean(x, na.rm = TRUE)) %>%
  add_count(country)

#     id     x country mean_x     n
#  <dbl> <dbl> <fct>    <dbl> <int>
#1     1   2   USA        3       2
#2     2   4   USA        3       2
#3     3   3.5 JPN        3.5     2
#4     4  NA   JPN        3.5     2

However, both of this function don't do what you need. To count non-NA values per group, you need

df %>% 
  group_by(country) %>% 
  mutate(mean_x = mean(x, na.rm = TRUE), 
         count = length(na.omit(x)))
         #OR
         #count = sum(!is.na(x)))#as @Humpelstielzchen mentioned


#    id     x country mean_x count
#  <dbl> <dbl> <fct>    <dbl> <int>
#1     1   2   USA        3       2
#2     2   4   USA        3       2
#3     3   3.5 JPN        3.5     1
#4     4  NA   JPN        3.5     1

row wise NA count across some columns - grouped by id

library(tidyverse)

threshold = 10

df %>% group_by(id) %>%
  mutate(evidence = ifelse(n()*5 - sum(na_count) >= threshold, "yes", "no"))

The 5 comes from the number of columns you have, q1:q5.

R count how many elements in a group occur in a dataframe

df %>% filter(Class != "f") %>% 
  group_by(Subject) %>% 
  summarise(`# of occurrences` = n_distinct(Class)) %>% 
  group_by(`# of occurrences`) %>% 
  summarise(count = length(Subject), 
            count.from.subject = paste(Subject, collapse = ","))

Edit:

You can use also use mutate with group_by instead of summarise, which will append the same value to each element in the group:
(with complete you can extend the missing values)

df %>% 
  mutate(Class = na_if(Class, "f")) %>% 
  group_by(Subject) %>% 
  summarise(`# of occurrences` = n_distinct(na.omit(Class)), 
            count.class = na_if(paste(sort(unique(na.omit(Class))), collapse = ","), "")) %>% 
  group_by(`# of occurrences`) %>% 
  mutate(count = n()) %>% ungroup() %>% 
  complete(`# of occurrences` = 0:5, fill = list(count = 0)) %>% 
  transmute(`# of occurrences`, count, count.from.subject = Subject, count.class)