Count non-NA values by group
You can use this
mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))
# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2
Count non-`NA` of several columns by group using summarize and across from dplyr
I hope this is what you are looking for:
library(dplyr)
d %>%
group_by(ID) %>%
summarise(across(Col1:Col3, ~ sum(!is.na(.x)), .names = "non-{.col}"))
# A tibble: 3 x 4
ID `non-Col1` `non-Col2` `non-Col3`
<dbl> <int> <int> <int>
1 1 3 2 3
2 2 2 0 2
3 3 1 1 0
Or if you would like to select columns by their shared string you can use this:
d %>%
group_by(ID) %>%
summarise(across(contains("Col"), ~ sum(!is.na(.x)), .names = "non-{.col}"))
Taking a count() after group_by() for non-missing values
count
is not the right function here. The first argument to count
is a dataframe or tibble specifically. However, what you are passing is a vector hence you get the error. Also count
summarises the dataframe so that you have only one row per group. See for example,
library(dplyr)
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
count(country)
# country n
# <fct> <int>
#1 JPN 2
#2 USA 2
If you want to add a new column without summarising, use add_count
instead
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE)) %>%
add_count(country)
# id x country mean_x n
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 2
#4 4 NA JPN 3.5 2
However, both of this function don't do what you need. To count non-NA values per group, you need
df %>%
group_by(country) %>%
mutate(mean_x = mean(x, na.rm = TRUE),
count = length(na.omit(x)))
#OR
#count = sum(!is.na(x)))#as @Humpelstielzchen mentioned
# id x country mean_x count
# <dbl> <dbl> <fct> <dbl> <int>
#1 1 2 USA 3 2
#2 2 4 USA 3 2
#3 3 3.5 JPN 3.5 1
#4 4 NA JPN 3.5 1
From an R dataframe: count non-NA values by column, grouped by one of the columns
We can use summarise_all
library(dplyr)
litmus %>%
group_by(grouping) %>%
summarise_all(funs(sum(!is.na(.))))
Count number of non-NA values by group
Or if you wanted to use data.table:
library(data.table)
dt[,sum(!is.na(X2)),by=.(Color)]
Color V1
1: Red 2
2: Blue 0
3: Green 1
Also its easy enough to use an ifelse()
in your data.table to get an NA for blue instead of 0. See:
dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]
Color V1
1: Red 2
2: Blue NA
3: Green 1
Data:
dt <- as.data.table(fread("Color X1 X2 X3 X4
Red 1 1 0 2
Blue 0 NA 4 1
Red 3 4 3 1
Green 2 2 1 0"))
R group by, counting non-NA values
We can use data.table
. Convert the 'data.frame' to 'data.table' (setDT(toy_df)
), grouped by 'Label', loop through the Subset of Data.table (.SD
) and get the sum
of non-NA values (!is.na(x)
)
library(data.table)
setDT(toy_df)[, lapply(.SD, function(x) sum(!is.na(x))), by = Label]
# Label Y X1 X2
#1: A 1 1 1
#2: B 2 0 2
#3: C 1 0 0
Or with dplyr
using the same methodology
library(dplyr)
toy_df %>%
group_by(Label) %>%
summarise_each(funs(sum(!is.na(.))))
Or a base R
option with by
and colSums
grouped by the 4th column on logical matrix (!is.na(toy_df[-4])
)
by(!is.na(toy_df[-4]), toy_df[4], FUN = colSums)
Or with rowsum
with similar approach as in by
except using the rowsum
function.
rowsum(+(!is.na(toy_df[-4])), group=toy_df[,4])
# Y X1 X2
#A 1 1 1
#B 2 0 2
#C 1 0 0
How to get count of all non NA values for all variables by group?
using dplyr
>= 1.0.0:
df %>%
group_by(size) %>%
summarise(across(everything(), ~sum(!is.na(.))))
# A tibble: 2 x 6
size P1 P2 P4 P3 P5
* <chr> <int> <int> <int> <int> <int>
1 10K 5 5 2 2 1
2 5K 5 5 0 0 0
To get the exact output as OP's:
df %>%
group_by(size) %>%
summarise_all(~sum(!is.na(.))) %>%
arrange(desc(size)) # weird(ironic?) though
# A tibble: 2 x 6
size P1 P2 P4 P3 P5
<chr> <int> <int> <int> <int> <int>
1 5K 5 5 0 0 0
2 10K 5 5 2 2 1
For dplyr versions below 0.8.9.9000:
df %>%
group_by(size) %>%
summarise_all(~sum(!is.na(.)))
# A tibble: 2 x 6
size P1 P2 P4 P3 P5
* <chr> <int> <int> <int> <int> <int>
1 10K 5 5 2 2 1
2 5K 5 5 0 0 0
Data
df <- structure(list(size = c("5K", "5K", "5K", "5K", "5K", "10K",
"10K", "10K", "10K", "10K"), P1 = c(3L, 3L, 2L, 4L, 3L, 4L, 5L,
4L, 3L, 4L), P2 = c(3L, 3L, 2L, 3L, 3L, 4L, 4L, 3L, 3L, 3L),
P4 = c(NA, NA, NA, NA, NA, NA, NA, 4L, 3L, NA), P3 = c(NA,
NA, NA, NA, NA, NA, NA, 4L, 3L, NA), P5 = c(NA, NA, NA, NA,
NA, NA, NA, NA, 1L, NA)), class = "data.frame", row.names = c(NA,
-10L))
How to make n() do not count NAs too in tidyverse?
We can filter
at the top and then do the summarise
without changing the code
library(dplyr)
mwe %>%
filter(!is.na(Amt)) %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2),
N = n()) %>%
ungroup()
Or another option is to change the n()
to sum(!is.na(Amt))
mwe %>%
dplyr::group_by(Food, Site) %>%
dplyr::summarise(Median = round(median(Amt, na.rm=TRUE), digits=2),
N = sum(!is.na(Amt))) %>%
ungroup()
How to count total non-na values using dplyr?
If it is the count of non-NA, create a logical vector (!is.na(value)
) and get the sum
library(dplyr)
df %>%
group_by(error_code) %>%
summarise(value = sum(!is.na(value)))
If the column name/names is stored in an object, use across
as it is generalized and can work for multiple columns
df %>%
group_by(error_code) %>%
summarise(across(categoryColumn, ~ sum(!is.na(.))))
sum(is.na(.))
outside the across
will check for the total sum
of all the elements as the .
here implies the whole dataset and not the column per se
Related Topics
Indexing Integer Vector with Na
Fit Many Formulae at Once, Faster Options Than Lapply
Usage of Uioutput in Multiple Menuitems in R Shiny Dashboard
"Object Not Found" Error Within a User Defined Function, Eval() Function
Using User-Defined "For Loop" Function to Construct a Data Frame
Principal Components Analysis - How to Get the Contribution (%) of Each Parameter to a Prin.Comp
How to Get the Second Sub Element of Every Element in a List
Higher Level Functions in R - Is There an Official Compose Operator or Curry Function
How to Rbind All the Data.Frames in Your Working Environment
Filling Under the a Curve with Ggplot Graphs
Loess Fit and Resulting Equation
Importing Data into R (Rdata) from Github
Mutate Multiple/Consecutive Columns (With Dplyr or Base R)
How to Use Black-And-White Fill Patterns Instead of Color Coding on Calendar Heatmap