Simple Method of Counting Non-Nas in Column of Data String

Simple method of counting non-NAs in column of data String

For a data.frame you can get it using colSums and is.na:

set.seed(45)
df <- data.frame(matrix(sample(c(NA,1:5), 50, replace=TRUE), ncol=5))
# X1 X2 X3 X4 X5
# 1 3 2 NA 2 NA
# 2 1 5 1 1 4
# 3 1 1 3 2 3
# 4 2 2 3 5 3
# 5 2 2 5 2 2
# 6 1 2 NA 3 3
# 7 1 5 5 5 2
# 8 3 NA 4 1 5
# 9 1 2 3 NA 1
# 10 NA 1 1 2 2

colSums(!is.na(df))
# X1 X2 X3 X4 X5
# 9 9 8 9 9

Count number of non-NA values for every column in a dataframe

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70

Counting non NAs in a data frame; getting answer as a vector

Try this:

# define "demo" dataset
ZZZ <- data.frame(n=c(1,2,NA),m=c(6,NA,NA),o=c(7,8,8))
# apply the counting function per columns
apply(ZZZ, 2, function(x) length(which(!is.na(x))))

Having run:

> apply(ZZZ, 2, function(x) length(which(!is.na(x))))
n m o
2 1 3

If you really insist on returning a vector, you might use as.vector, e.g. by defining this function:

nonNAs <- function(x) {
as.vector(apply(x, 2, function(x) length(which(!is.na(x)))))
}

You could simply run nonNAs(ZZZ):

> nonNAs(ZZZ)
[1] 2 1 3

Count non-NA values by group

You can use this

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2

From an R dataframe: count non-NA values by column, grouped by one of the columns

We can use summarise_all

library(dplyr)
litmus %>%
group_by(grouping) %>%
summarise_all(funs(sum(!is.na(.))))

Efficient way to calculate non-na rows vs NA rows in a column

No need for particular function, base R you can simply do:

colSums(is.na(df))/colSums(!is.na(df))
# a b c
#2.0 0.5 Inf

For a particular set of columns:

colSums(is.na(df))/colSums(!is.na(df))  # works also with one value aka 'a'

Data:

 df = data.frame(a=c(NA,NA,4),b=c(NA,1,2),c=c(NA,NA,NA))

Create new column based on counting non-NA values across multiple columns

df$column_non_NA= rowSums(!is.na(df[-1]))
df
Q1 Q1a Q1b Q1c column_non_NA
1 Yes AAA BBB <NA> 2
2 No <NA> <NA> <NA> 0
3 Yes AAA <NA> <NA> 1
4 No <NA> <NA> <NA> 0
5 Yes ABC BCD EFG 3
6 Yes DDD <NA> <NA> 1
7 Yes EEE AAA AAA 3

Count number of non-NA values by group

Or if you wanted to use data.table:

library(data.table)

dt[,sum(!is.na(X2)),by=.(Color)]

Color V1
1: Red 2
2: Blue 0
3: Green 1

Also its easy enough to use an ifelse() in your data.table to get an NA for blue instead of 0. See:

dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]

Color V1
1: Red 2
2: Blue NA
3: Green 1

Data:

 dt <- as.data.table(fread("Color    X1      X2    X3    X4
Red 1 1 0 2
Blue 0 NA 4 1
Red 3 4 3 1
Green 2 2 1 0"))


Related Topics



Leave a reply



Submit