How to count how many values per level in a given factor?
Or using the dplyr
library:
library(dplyr)
set.seed(1)
dat <- data.frame(ID = sample(letters,100,rep=TRUE))
dat %>%
group_by(ID) %>%
summarise(no_rows = length(ID))
Note the use of %>%
, which is similar to the use of pipes in bash. Effectively, the code above pipes dat
into group_by
, and the result of that operation is piped into summarise
.
The result is:
Source: local data frame [26 x 2]
ID no_rows
1 a 2
2 b 3
3 c 3
4 d 3
5 e 2
6 f 4
7 g 6
8 h 1
9 i 6
10 j 5
11 k 6
12 l 4
13 m 7
14 n 2
15 o 2
16 p 2
17 q 5
18 r 4
19 s 5
20 t 3
21 u 8
22 v 4
23 w 5
24 x 4
25 y 3
26 z 1
See the dplyr
introduction for some more context, and the documentation for details regarding the individual functions.
Count total missing values by group?
data.table
solution
library(data.table)
setDT(df1)
df1[, .(sumNA = sum(is.na(.SD))), by = Z]
# Z sumNA
# 1: A 559
# 2: C 661
# 3: E 596
# 4: B 597
# 5: D 560
dplyr
solution using rowSums(.[-1])
, i.e. row-sums for all columns except the first.
library(dplyr)
df1 %>%
group_by(Z) %>%
summarise_all(~sum(is.na(.))) %>%
transmute(Z, sumNA = rowSums(.[-1]))
# # A tibble: 5 x 2
# Z sumNA
# <fct> <dbl>
# 1 A 559
# 2 B 597
# 3 C 661
# 4 D 560
# 5 E 596
Finding number of NA in multiple columns by group
Base R is your enemy here.
data.table
is friendlier:
library(data.table)
setDT(df) # <- convert to data.table
# going column-by-column, count NA
df[ , lapply(.SD, function(x) sum(is.na(x))), by = City]
See Getting Started with data.table
, a primer on .SD
, and this on the use of lapply(.SD,...)
for more.
Note well that the use of colSums
necessitates converting your data.frame
to a matrix
, which will force all the columns to have the same class (here, character
) if they don't already, which can be costly.
R factor levels as column names and count values
data_sample <- data.frame(
PatID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L),
status1 = c("I250", "NA", "NA", "X560", "M206", "NA", "NA", "M206", "NA"),
status2 = c(".", "M206", "NA", "I250", "I250", "M206", "NA", "NA", "X560"),
status3 = c(".", "I250", "NA", "NA", "NA", "I250", "X560", "NA", "NA")
)
library(tidyverse)
data_sample %>%
gather(status_num, value, -PatID) %>%
filter(value != "NA", value != ".") %>%
count(PatID, value) %>% # Improvement by @antoniosk
spread(value, n, fill = 0)
# A tibble: 3 x 4
# Groups: PatID [3]
PatID I250 M206 X560
<int> <int> <int> <int>
1 1 2 1 NA
2 2 2 1 1
3 3 1 2 2
Counting number of elements in a character column by levels of a factor column in a dataframe
A dplyr solution.
df %>%
filter(!is.na(product)) %>%
group_by(company) %>%
count()
# A tibble: 4 × 2
comp n
<fctr> <int>
1 A 2
2 B 2
3 C 3
4 D 1
Count of NAs colwise by a group
We can use aggregate
aggregate(.~grp, data=dat, FUN= function(x) sum(is.na(x)))
Or with dplyr
library(dplyr)
dat %>%
group_by(grp) %>%
summarise_each(funs(sum(is.na(.)))
Or using data.table
library(data.table)
setDT(dat)[, lapply(.SD, function(x) sum(is.na(x))), grp]
Or as @David Arenburg mentioned in the comments, rowsum
is another option where we can do the group by operation while summing. We used +
to coerce the logical matrix (is.na(dat)
) to binary as the function will not work with logical class.
rowsum(+(is.na(dat)), dat$grp)
Related Topics
Do Not Open Rstudio Internal Browser After Knitting
Barplot with Multiple Columns in R
Fastest Way to Parse a Date-Time String to Class Date
Error with New R 3.1.3 Version
Check If a Program Is Installed
Get Country (And Continent) from Longitude and Latitude Point in R
Convert Latitude/Longitude to State Plane Coordinates
How to Fuzzy Join Based on Multiple Columns and Conditions
Reconstruct Symmetric Matrix from Values in Long-Form
R - Stuck with Plot() - Colouring Shapefile Polygons Based Upon a Slot Value
How to Subscript The X Axis Tick Label
Initialize a List of Matrices in R
Find If Each Row of a Logical Matrix Has at Least One True
R: Why Does Strptime Always Return Na When I Try to Format a Date String
Specifying Gpar Settings for Grid Arrows in R
How to Append R Data Frame into Existing Excel Without Overwriting