Count how many values in some cells of a row are not NA (in R)
You can use is.na()
over the selected columns, then rowSums()
the result:
library(stringr)
df <- data_frame(
id = 1:10
, name = fruit[1:10]
, word1 = c(words[1:5],NA,words[7:10])
, word2 = words[11:20]
, word3 = c(NA,NA,NA,words[25],NA,NA,words[32],NA,NA,words[65]))
df$word_count <- rowSums( !is.na( df [,3:5]))
df
id name word1 word2 word3 n_words
<int> <chr> <chr> <chr> <chr> <dbl>
1 1 apple a actual <NA> 2
2 2 apricot able add <NA> 2
3 3 avocado about address <NA> 2
4 4 banana absolute admit agree 3
5 5 bell pepper accept advertise <NA> 2
6 6 bilberry <NA> affect <NA> 1
7 7 blackberry achieve afford alright 3
8 8 blackcurrant across after <NA> 2
9 9 blood orange act afternoon <NA> 2
10 10 blueberry active again awful 3
Edit
Using dplyr
you could do this:
df %>%
select(3:5) %>%
is.na %>%
`!` %>%
rowSums
Count the number of non-NA numeric values of each row in dplyr
Use select
+ is.na
+ rowSums
, select(., -id)
returns the original data frame (.
) with id
excluded, and then count number of non-NA values with rowSums(!is.na(...))
:
df %>% mutate(var4 = rowSums(!is.na(select(., -id))))
# id var1 var2 var3 var4
#1 1 10 NA 4 2
#2 2 11 1 NA 2
#3 3 12 2 5 3
#4 4 13 2 NA 2
#5 5 14 1 NA 2
#6 6 15 1 NA 2
#7 7 16 1 5 3
#8 8 17 NA 4 2
#9 9 18 NA 4 2
#10 10 19 NA NA 1
How to 'count' number of non-empty values in a single row across multiple columns in a dataframe
If you are talking about missing values in R
, it's represented in capital letter NA
instead of na
, otherwise, R
will treat it as a string, which is not empty.
Also, I have artificially included some Name
in your df
to act like each row represents one Name
, and a artificial Comp5
which includes some NA
s but will not be included in the calculation.
rowSums()
as its name suggests, calculates the sum of the row.is.na(df[, 2:4])
makes it only counts the NA
in df
from column 2 to column 4.
df <-read.table(header = T,
text =
"Name Comp1 Comp2 Comp3 Comp4 Comp5
A 0.5 0.4 NA 0.6 NA
B 0.6 NA NA 0.7 1
C NA 0.4 NA 1.1 NA")
df$Count_NA <- rowSums(is.na(df[, 2:4]))
Output
Name Comp1 Comp2 Comp3 Comp4 Comp5 Count_NA
1 A 0.5 0.4 NA 0.6 NA 1
2 B 0.6 NA NA 0.7 1 2
3 C NA 0.4 NA 1.1 NA 2
Count number of non-NA values for every column in a dataframe
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
Count non-NA values by group
You can use this
mydf %>% group_by(col_1) %>% summarise(non_na_count = sum(!is.na(col_2)))
# A tibble: 2 x 2
col_1 non_na_count
<fctr> <int>
1 A 1
2 B 2
Count number of NA's in a Row in Specified Columns R
df$na_count <- rowSums(is.na(df[c('first', 'last', 'address', 'phone', 'state')]))
df
first m_initial last address phone state customer na_count
1 Bob L Turner 123 Turner Lane 410-3141 Iowa <NA> 0
2 Will P Williams 456 Williams Rd 491-2359 <NA> Y 1
3 Amanda C Jones 789 Haggerty <NA> <NA> Y 2
4 Lisa <NA> Evans <NA> <NA> <NA> N 3
Create new column based on counting non-NA values across multiple columns
df$column_non_NA= rowSums(!is.na(df[-1]))
df
Q1 Q1a Q1b Q1c column_non_NA
1 Yes AAA BBB <NA> 2
2 No <NA> <NA> <NA> 0
3 Yes AAA <NA> <NA> 1
4 No <NA> <NA> <NA> 0
5 Yes ABC BCD EFG 3
6 Yes DDD <NA> <NA> 1
7 Yes EEE AAA AAA 3
Related Topics
Convert Character Matrix into Numeric Matrix
Standard Deviation in R Seems to Be Returning the Wrong Answer - am I Doing Something Wrong
Exporting Non-S3-Methods with Dots in the Name Using Roxygen2 V4
How to Set the Default Language of Date in R
Is There a Weighted.Median() Function
How to Convert Integer into Categorical Data in R
Join R Data.Tables Where Key Values Are Not Exactly Equal--Combine Rows with Closest Times
Select Row with Most Recent Date by Group
Add a Box for the Na Values to the Ggplot Legend for a Continuous Map
Modify X-Axis Labels in Each Facet
Perform Multiple Paired T-Tests Based on Groups/Categories
Dplyr - Group by and Select Top X %