Function to Count Na Values at Each Level of a Factor

How to count how many values per level in a given factor?

Or using the dplyr library:

library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))

Note the use of %>%, which is similar to the use of pipes in bash. Effectively, the code above pipes dat into group_by, and the result of that operation is piped into summarise.

The result is:

Source: local data frame [26 x 2]

ID no_rows
1 a 2
2 b 3
3 c 3
4 d 3
5 e 2
6 f 4
7 g 6
8 h 1
9 i 6
10 j 5
11 k 6
12 l 4
13 m 7
14 n 2
15 o 2
16 p 2
17 q 5
18 r 4
19 s 5
20 t 3
21 u 8
22 v 4
23 w 5
24 x 4
25 y 3
26 z 1

See the dplyr introduction for some more context, and the documentation for details regarding the individual functions.

Count total missing values by group?

data.table solution

library(data.table)
setDT(df1)

df1[, .(sumNA = sum(is.na(.SD))), by = Z]

# Z sumNA
# 1: A 559
# 2: C 661
# 3: E 596
# 4: B 597
# 5: D 560

dplyr solution using rowSums(.[-1]), i.e. row-sums for all columns except the first.

library(dplyr)

df1 %>%
group_by(Z) %>%
summarise_all(~sum(is.na(.))) %>%
transmute(Z, sumNA = rowSums(.[-1]))

# # A tibble: 5 x 2
# Z sumNA
# <fct> <dbl>
# 1 A 559
# 2 B 597
# 3 C 661
# 4 D 560
# 5 E 596

Finding number of NA in multiple columns by group

Base R is your enemy here.

data.table is friendlier:

library(data.table)
setDT(df) # <- convert to data.table
# going column-by-column, count NA
df[ , lapply(.SD, function(x) sum(is.na(x))), by = City]

See Getting Started with data.table, a primer on .SD, and this on the use of lapply(.SD,...) for more.

Note well that the use of colSums necessitates converting your data.frame to a matrix, which will force all the columns to have the same class (here, character) if they don't already, which can be costly.

R factor levels as column names and count values

data_sample <- data.frame(
PatID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
status1 = c("I250", "NA", "NA", "X560", "M206", "NA", "NA", "M206", "NA"),
status2 = c(".", "M206", "NA", "I250", "I250", "M206", "NA", "NA", "X560"),
status3 = c(".", "I250", "NA", "NA", "NA", "I250", "X560", "NA", "NA")
)

library(tidyverse)
data_sample %>%
gather(status_num, value, -PatID) %>%
filter(value != "NA", value != ".") %>%
count(PatID, value) %>% # Improvement by @antoniosk
spread(value, n, fill = 0)

# A tibble: 3 x 4
# Groups: PatID [3]
PatID I250 M206 X560
<int> <int> <int> <int>
1 1 2 1 NA
2 2 2 1 1
3 3 1 2 2

Counting number of elements in a character column by levels of a factor column in a dataframe

A dplyr solution.

df %>% 
filter(!is.na(product)) %>%
group_by(company) %>%
count()

# A tibble: 4 × 2
comp n
<fctr> <int>
1 A 2
2 B 2
3 C 3
4 D 1

Count of NAs colwise by a group

We can use aggregate

aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))

Or with dplyr

library(dplyr)
dat %>%
group_by(grp) %>%
summarise_each(funs(sum(is.na(.)))

Or using data.table

library(data.table)
setDT(dat)[, lapply(.SD, function(x) sum(is.na(x))), grp]

Or as @David Arenburg mentioned in the comments, rowsum is another option where we can do the group by operation while summing. We used + to coerce the logical matrix (is.na(dat)) to binary as the function will not work with logical class.

 rowsum(+(is.na(dat)), dat$grp)


Related Topics



Leave a reply



Submit