Simple method of counting non-NAs in column of data String
For a data.frame
you can get it using colSums
and is.na
:
set.seed(45)
df <- data.frame(matrix(sample(c(NA,1:5), 50, replace=TRUE), ncol=5))
# X1 X2 X3 X4 X5
# 1 3 2 NA 2 NA
# 2 1 5 1 1 4
# 3 1 1 3 2 3
# 4 2 2 3 5 3
# 5 2 2 5 2 2
# 6 1 2 NA 3 3
# 7 1 5 5 5 2
# 8 3 NA 4 1 5
# 9 1 2 3 NA 1
# 10 NA 1 1 2 2
colSums(!is.na(df))
# X1 X2 X3 X4 X5
# 9 9 8 9 9
Count number of non-NA values for every column in a dataframe
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
Counting non NAs in a data frame; getting answer as a vector
Try this:
# define "demo" dataset
ZZZ <- data.frame(n=c(1,2,NA),m=c(6,NA,NA),o=c(7,8,8))
# apply the counting function per columns
apply(ZZZ, 2, function(x) length(which(!is.na(x))))
Having run:
> apply(ZZZ, 2, function(x) length(which(!is.na(x))))
n m o
2 1 3
If you really insist on returning a vector, you might use as.vector
, e.g. by defining this function:
nonNAs <- function(x) {
as.vector(apply(x, 2, function(x) length(which(!is.na(x)))))
}
You could simply run nonNAs(ZZZ)
:
> nonNAs(ZZZ)
[1] 2 1 3
Count non-NA values by group
You can use this
mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))
# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2
From an R dataframe: count non-NA values by column, grouped by one of the columns
We can use summarise_all
library(dplyr)
litmus %>%
group_by(grouping) %>%
summarise_all(funs(sum(!is.na(.))))
Efficient way to calculate non-na rows vs NA rows in a column
No need for particular function, base R
you can simply do:
colSums(is.na(df))/colSums(!is.na(df))
# a b c
#2.0 0.5 Inf
For a particular set of columns:
colSums(is.na(df))/colSums(!is.na(df)) # works also with one value aka 'a'
Data:
df = data.frame(a=c(NA,NA,4),b=c(NA,1,2),c=c(NA,NA,NA))
Create new column based on counting non-NA values across multiple columns
df$column_non_NA= rowSums(!is.na(df[-1]))
df
Q1 Q1a Q1b Q1c column_non_NA
1 Yes AAA BBB <NA> 2
2 No <NA> <NA> <NA> 0
3 Yes AAA <NA> <NA> 1
4 No <NA> <NA> <NA> 0
5 Yes ABC BCD EFG 3
6 Yes DDD <NA> <NA> 1
7 Yes EEE AAA AAA 3
Count number of non-NA values by group
Or if you wanted to use data.table:
library(data.table)
dt[,sum(!is.na(X2)),by=.(Color)]
Color V1
1: Red 2
2: Blue 0
3: Green 1
Also its easy enough to use an ifelse()
in your data.table to get an NA for blue instead of 0. See:
dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]
Color V1
1: Red 2
2: Blue NA
3: Green 1
Data:
dt <- as.data.table(fread("Color X1 X2 X3 X4
Red 1 1 0 2
Blue 0 NA 4 1
Red 3 4 3 1
Green 2 2 1 0"))
Related Topics
Multiple Boxplots Using Ggplot
Installing R 3.5.0 with --Enable-R-Shlib
How to Do a Data.Table Merge Operation
In Ggplot2, How to Add Additional Legend
How to Run Lm Regression for Every Column in R
Get the Column Number in R Given the Column Name
Working with Dictionaries/Lists to Get List of Keys
How to Remove Rows That Have Only 1 Combination for a Given Id
Replace Na with Groups Mean in a Non Specified Number of Columns
How to Have Conditional Markdown Chunk Execution in Rmarkdown
Dplyr - Summary Table for Multiple Variables
R Install Package Loaded Namespace
Using Rcpp Functions Inside of R's Par*Apply Functions from the Parallel Package