Counting Non Nas in a Data Frame; Getting Answer as a Vector

Counting non NAs in a data frame; getting answer as a vector

Try this:

# define "demo" dataset
ZZZ <- data.frame(n=c(1,2,NA),m=c(6,NA,NA),o=c(7,8,8))
# apply the counting function per columns
apply(ZZZ, 2, function(x) length(which(!is.na(x))))

Having run:

> apply(ZZZ, 2, function(x) length(which(!is.na(x))))
n m o 
2 1 3

If you really insist on returning a vector, you might use as.vector, e.g. by defining this function:

nonNAs <- function(x) {
    as.vector(apply(x, 2, function(x) length(which(!is.na(x)))))
    }

You could simply run nonNAs(ZZZ):

> nonNAs(ZZZ)
[1] 2 1 3

Simple method of counting non-NAs in column of data String

For a data.frame you can get it using colSums and is.na:

set.seed(45)
df <- data.frame(matrix(sample(c(NA,1:5), 50, replace=TRUE), ncol=5))
#    X1 X2 X3 X4 X5
# 1   3  2 NA  2 NA
# 2   1  5  1  1  4
# 3   1  1  3  2  3
# 4   2  2  3  5  3
# 5   2  2  5  2  2
# 6   1  2 NA  3  3
# 7   1  5  5  5  2
# 8   3 NA  4  1  5
# 9   1  2  3 NA  1
# 10 NA  1  1  2  2

colSums(!is.na(df))
# X1 X2 X3 X4 X5 
#  9  9  8  9  9

Count number of non-NA values for every column in a dataframe

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame':    100 obs. of  5 variables:
#>  $ V1: int  NA 1 NA NA 1 NA 1 1 1 NA ...
#>  $ V2: int  NA NA NA 1 NA 1 0 1 0 NA ...
#>  $ V3: int  1 1 0 1 1 NA NA 1 NA NA ...
#>  $ V4: int  NA 0 NA 0 0 NA 1 1 NA NA ...
#>  $ V5: int  NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5 
#> 69 55 62 60 70

Count number of non-NA values by group

Or if you wanted to use data.table:

library(data.table)

dt[,sum(!is.na(X2)),by=.(Color)]

  Color V1
1:   Red  2
2:  Blue  0
3: Green  1

Also its easy enough to use an ifelse() in your data.table to get an NA for blue instead of 0. See:

dt[,ifelse(sum(!is.na(X2)==0),as.integer(NA),sum(!is.na(X2))),by=.(Color)]

   Color V1
1:   Red  2
2:  Blue NA
3: Green  1

Data:

 dt <- as.data.table(fread("Color    X1      X2    X3    X4
Red      1       1     0     2
Blue     0       NA    4     1 
Red      3       4     3     1
Green    2       2     1     0"))

Efficiently counting non-NA elements in data.table

Yes the option 3rd seems to be the best one. I've added another one which is valid only if you consider to change the key of your data.table from id to var, but still option 3 is the fastest on your data.

library(microbenchmark)
library(data.table)

dt<-data.table(id=(1:100)[sample(10,size=1e6,replace=T)],var=c(1,0,NA)[sample(3,size=1e6,replace=T)],key=c("var"))

dt1 <- copy(dt)
dt2 <- copy(dt)
dt3 <- copy(dt)
dt4 <- copy(dt)

microbenchmark(times=10L,
               dt1[!is.na(var),.N,by=id][,max(N,na.rm=T),by=id],
               dt2[,length(var[!is.na(var)]),by=id],
               dt3[,sum(!is.na(var)),by=id],
               dt4[.(c(1,0)),.N,id,nomatch=0L])
# Unit: milliseconds
#                                                         expr      min       lq      mean    median        uq       max neval
#  dt1[!is.na(var), .N, by = id][, max(N, na.rm = T), by = id] 95.14981 95.79291 105.18515 100.16742 112.02088 131.87403    10
#                     dt2[, length(var[!is.na(var)]), by = id] 83.17203 85.91365  88.54663  86.93693  89.56223 100.57788    10
#                             dt3[, sum(!is.na(var)), by = id] 45.99405 47.81774  50.65637  49.60966  51.77160  61.92701    10
#                        dt4[.(c(1, 0)), .N, id, nomatch = 0L] 78.50544 80.95087  89.09415  89.47084  96.22914 100.55434    10

Count non-NA values by group

You can use this

mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))

# A tibble: 2 x 2
   col_1 non_na_count
  <fctr>        <int>
1      A            1
2      B            2

Counting Non Nas in a Data Frame; Getting Answer as a Vector