Count total missing values by group?
data.table
solution
library(data.table)
setDT(df1)
df1[, .(sumNA = sum(is.na(.SD))), by = Z]
# Z sumNA
# 1: A 559
# 2: C 661
# 3: E 596
# 4: B 597
# 5: D 560
dplyr
solution using rowSums(.[-1])
, i.e. row-sums for all columns except the first.
library(dplyr)
df1 %>%
group_by(Z) %>%
summarise_all(~sum(is.na(.))) %>%
transmute(Z, sumNA = rowSums(.[-1]))
# # A tibble: 5 x 2
# Z sumNA
# <fct> <dbl>
# 1 A 559
# 2 B 597
# 3 C 661
# 4 D 560
# 5 E 596
Count non-NA values by group
You can use this
mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))
# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2
Count non-`NA` of several columns by group using summarize and across from dplyr
I hope this is what you are looking for:
library(dplyr)
d %>%
group_by(ID) %>%
summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))
# A tibble: 3 x 4
ID `non-Col1` `non-Col2` `non-Col3`
<dbl> <int> <int> <int>
1 1 3 2 3
2 2 2 0 2
3 3 1 1 0
Or if you would like to select columns by their shared string you can use this:
d %>%
group_by(ID) %>%
summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))
Group by count NAs as zeros
We can get the sum
of a logical vector created with is.na
to get the count as TRUE => 1
and FALSE => 0
so the sum
returns the count of non-NA elements
library(dplyr)
df %>%
group_by(group) %>%
summarise(n = sum(!is.na(id)))
# A tibble: 5 x 2
# group n
# * <chr> <int>
#1 A 2
#2 B 1
#3 C 0
#4 D 1
#5 E 0
Or use length
after subsetting
df %>%
group_by(group) %>%
summarise(n = length(id[!is.na(id)]))
n()
returns the total number of rows including the missing values
Count of NAs colwise by a group
We can use aggregate
aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))
Or with dplyr
library(dplyr)
dat %>%
group_by(grp) %>%
summarise_each(funs(sum(is.na(.)))
Or using data.table
library(data.table)
setDT(dat)[, lapply(.SD, function(x) sum(is.na(x))), grp]
Or as @David Arenburg mentioned in the comments, rowsum
is another option where we can do the group by operation while summing. We used +
to coerce the logical matrix (is.na(dat)
) to binary as the function will not work with logical class.
rowsum(+(is.na(dat)), dat$grp)
Taking a count() after group_by() for non-missing values
count
is not the right function here. The first argument to count
is a dataframe or tibble specifically. However, what you are passing is a vector hence you get the error. Also count
summarises the dataframe so that you have only one row per group. See for example,
library(dplyr)
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
count(country)
# country n
# <fct> <int>
#1 JPN 2
#2 USA 2
If you want to add a new column without summarising, use add_count
instead
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
add_count(country)
# id x country mean_x n
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 2
#4 4 NA JPN 3.5 2
However, both of this function don't do what you need. To count non-NA values per group, you need
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count = length(na.omit(x)))
#OR
#count = sum(!is.na(x)))#as @Humpelstielzchen mentioned
# id x country mean_x count
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 1
#4 4 NA JPN 3.5 1
row wise NA count across some columns - grouped by id
library(tidyverse)
threshold = 10
df %>% group_by(id) %>%
mutate(evidence = ifelse(n()*5 - sum(na_count) >= threshold, "yes", "no"))
The 5 comes from the number of columns you have, q1:q5.
R count how many elements in a group occur in a dataframe
df %>% filter(Class != "f") %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(Class)) %>%
group_by(`# of occurrences`) %>%
summarise(count = length(Subject),
count.from.subject = paste(Subject, collapse = ","))
Edit:
You can use also use mutate
with group_by
instead of summarise
, which will append the same value to each element in the group:
(with complete you can extend the missing values)
df %>%
mutate(Class = na_if(Class, "f")) %>%
group_by(Subject) %>%
summarise(`# of occurrences` = n_distinct(na.omit(Class)),
count.class = na_if(paste(sort(unique(na.omit(Class))), collapse = ","), "")) %>%
group_by(`# of occurrences`) %>%
mutate(count = n()) %>% ungroup() %>%
complete(`# of occurrences` = 0:5, fill = list(count = 0)) %>%
transmute(`# of occurrences`, count, count.from.subject = Subject, count.class)
Related Topics
How to Nicely Annotate a Ggplot2 (Manual)
Ggplot Geom_Text Font Size Control
How to Delete the First Row of a Dataframe in R
Outputting Multiple Lines of Text with Rendertext() in R Shiny
Ggplot Geom_Bar: Meaning of Aes(Group = 1)
Differencebetween Cat and Print
R Color Palettes for Many Data Classes
Producing a Vector Graphics Image (I.E. Metafile) in R Suitable for Printing in Word 2007
Any Way to Make Plot Points in Scatterplot More Transparent in R
R: How to Rbind Two Huge Data-Frames Without Running Out of Memory
How to Manipulate the Strip Text of Facet_Grid Plots
R: Replace Multiple Values in Multiple Columns of Dataframes with Na