Function to Count Na Values at Each Level of a Factor

How to count how many values per level in a given factor?

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>% 
  group_by(ID) %>%
  summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

   ID no_rows
1   a       2
2   b       3
3   c       3
4   d       3
5   e       2
6   f       4
7   g       6
8   h       1
9   i       6
10  j       5
11  k       6
12  l       4
13  m       7
14  n       2
15  o       2
16  p       2
17  q       5
18  r       4
19  s       5
20  t       3
21  u       8
22  v       4
23  w       5
24  x       4
25  y       3
26  z       1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

Count total missing values by group?

data.table solution

library(data.table)
setDT(df1)

df1[, .(sumNA = sum(is.na(.SD))), by = Z]

#    Z sumNA
# 1: A   559
# 2: C   661
# 3: E   596
# 4: B   597
# 5: D   560

dplyr solution using rowSums(.[-1]), i.e. row-sums for all columns except the first.

library(dplyr)

df1 %>% 
  group_by(Z) %>% 
  summarise_all(~sum(is.na(.))) %>% 
  transmute(Z, sumNA = rowSums(.[-1]))

# # A tibble: 5 x 2
#   Z     sumNA
#   <fct> <dbl>
# 1 A       559
# 2 B       597
# 3 C       661
# 4 D       560
# 5 E       596

Finding number of NA in multiple columns by group

Base R is your enemy here.

data.table is friendlier:

library(data.table)
setDT(df) # <- convert to data.table
# going column-by-column, count NA
df[ , lapply(.SD, function(x) sum(is.na(x))), by = City]

See Getting Started with data.table, a primer on .SD, and this on the use of lapply(.SD,...) for more.

Note well that the use of colSums necessitates converting your data.frame to a matrix, which will force all the columns to have the same class (here, character) if they don't already, which can be costly.

R factor levels as column names and count values

data_sample <- data.frame(
  PatID   = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
  status1 = c("I250", "NA", "NA", "X560", "M206", "NA", "NA", "M206", "NA"),
  status2 = c(".", "M206", "NA", "I250", "I250", "M206", "NA", "NA", "X560"),
  status3 = c(".", "I250", "NA", "NA", "NA", "I250", "X560", "NA", "NA")
)

library(tidyverse)
data_sample %>%
  gather(status_num, value, -PatID) %>%
  filter(value != "NA", value != ".") %>%
  count(PatID, value) %>%  # Improvement by @antoniosk 
  spread(value, n, fill = 0)

# A tibble: 3 x 4
# Groups:   PatID [3]
  PatID  I250  M206  X560
  <int> <int> <int> <int>
1     1     2     1    NA
2     2     2     1     1
3     3     1     2     2

Counting number of elements in a character column by levels of a factor column in a dataframe

A dplyr solution.

df %>% 
    filter(!is.na(product)) %>% 
    group_by(company) %>% 
    count()

# A tibble: 4 × 2
    comp     n
  <fctr> <int>
1      A     2
2      B     2
3      C     3
4      D     1

Count of NAs colwise by a group

We can use aggregate

aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
dat %>%
    group_by(grp) %>%
    summarise_each(funs(sum(is.na(.)))

Or using data.table

library(data.table)
setDT(dat)[, lapply(.SD,  function(x) sum(is.na(x))), grp]

Or as @David Arenburg mentioned in the comments, rowsum is another option where we can do the group by operation while summing. We used + to coerce the logical matrix (is.na(dat)) to binary as the function will not work with logical class.

 rowsum(+(is.na(dat)), dat$grp)