ignore NA in dplyr row sum
You could use this:
library(dplyr)
data %>%
#rowwise will make sure the sum operation will occur on each row
rowwise() %>%
#then a simple sum(..., na.rm=TRUE) is enough to result in what you need
mutate(sum = sum(a,b,c, na.rm=TRUE))
Output:
Source: local data frame [4 x 4]
Groups: <by row>
a b c sum
(dbl) (dbl) (dbl) (dbl)
1 1 4 7 12
2 2 NA 8 10
3 3 5 9 17
4 4 6 NA 10
Calculating Sum Column and ignoring Na
Add in na.rm=TRUE
i.e
rowSums(na.rm=TRUE)
Sum values from rows ignoring certain values in R
One way to do it in base
:
rowSums(dta[, 2:4] * (dta[, 2:4] < 7))
# [1] 0 4 2 2 NA 9
Adding explanation, according to @tjebo comment
- With
dta[, 2:4] < 7
you produce a dataframe populated withlogical
values, whereTRUE
orFALSE
corresponds to the values which are less or greater than7
. It is possible to do in one line, since this operation is vectorized; - Than, you multiply above logical dataframe, and a dataframe populated with your original values. Under the hood, R converts
logical
types intonumeric
types, so allFALSE
andTRUE
s from your logical dataset, are converted to0
s and1
s. Which means that you multiply your original values by1
if they are less than7
, and by0
s otherwise; - Since
NA < 7
producesNA
, and following multiplication byNA
will produceNA
s as well - you preserve the originalNA
s; - Last step is to call
rowSums()
on a resulting dataframe, which will sum up the values for each particular row. Since those of them that exceed7
are turned into0
s, you exclude them from resulting sum; - In case, when you want to get a sum for the rows where at least one value is not
NA
, you can usena.rm = TRUE
argument to yourrowSums()
call. However, in this case, for the rows withNA
s only you will get0
.
Ignoring NA when summing multiple columns with dplyr
The problem with your rowSums
is the reference to DF
(which is undefined). This works:
mutate(iris, sum2 = rowSums(cbind(Sepal.Length, Petal.Length), na.rm = T))
For difference, you could of course use a negative: rowSums(cbind(Sepal.Length, -Petal.Length), na.rm = T)
The general solution is to use ifelse
or similar to set the missing values to 0 (or whatever else is appropriate):
mutate(iris, sum2 = Sepal.Length + ifelse(is.na(Petal.Length), 0, Petal.Length))
More efficient than ifelse
would be an implementation of coalesce
, see examples here. This uses @krlmlr's answer from the previous link (see bottom for the code or use the kimisc package).
mutate(iris, sum2 = Sepal.Length + coalesce.na(Petal.Length, 0))
To replace missing values data-set wide, there is replace_na
in the tidyr
package.
@krlmlr's coalesce.na
, as found here
coalesce.na <- function(x, ...) {
x.len <- length(x)
ly <- list(...)
for (y in ly) {
y.len <- length(y)
if (y.len == 1) {
x[is.na(x)] <- y
} else {
if (x.len %% y.len != 0)
warning('object length is not a multiple of first object length')
pos <- which(is.na(x))
x[pos] <- y[(pos - 1) %% y.len + 1]
}
}
x
}
Sum 2 columns, ignore NA, except when both are NA
I used the following. It gives sums even when there are NAs, but returns NA when all sumed elements are NA.
rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0)
Ignore NA in vector sum
You can try rowSums
with na.rm = TRUE
(as @akrun said in the comment) like below
data$cat <- rowSums(data[-1] * c(0.05, 0.05, 0.05)[col(data[-1])], na.rm = TRUE)
which gives
> data
id v1 v2 v3 cat
1 1 1 78 101 9.00
2 1 2 85 NA 4.35
3 1 5 56 452 25.65
4 1 4 47 NA 2.55
5 1 58 12 NA 3.50
6 1 6 3 45 2.70
7 1 4 65 7 3.80
8 1 9 98 56 8.15
9 2 1 78 101 9.00
10 2 2 85 NA 4.35
11 2 5 56 452 25.65
12 2 4 47 NA 2.55
13 2 58 12 NA 3.50
14 2 6 3 45 2.70
15 2 4 65 7 3.80
16 2 9 98 56 8.15
17 3 1 78 101 9.00
18 3 2 85 NA 4.35
19 3 5 56 452 25.65
20 3 4 47 NA 2.55
21 3 58 12 NA 3.50
22 3 6 3 45 2.70
23 3 4 65 7 3.80
24 3 9 98 56 8.15
25 4 1 78 101 9.00
26 4 2 85 NA 4.35
27 4 5 56 452 25.65
28 4 4 47 NA 2.55
29 4 58 12 NA 3.50
30 4 6 3 45 2.70
31 4 4 65 7 3.80
32 4 9 98 56 8.15
How to exclude NA values from being counted in dplyr summarize()?
length
when compared (==
) with NA
returns NA
and when you subset a vector with NA
it returns NA
, hence NA
is calculated in length
.
Check this example :
x <- c(1:3, NA, 2:3, NA)
length(x)
#[1] 7
x == 3
#[1] FALSE FALSE TRUE NA FALSE TRUE NA
x[x == 3]
#[1] 3 NA 3 NA
length(x[x == 3])
#[1] 4
Here, you expected output to be 2 but it gives 4 because of NA
values. Perhaps, you can use :
length(na.omit(x[x == 3]))
#[1] 2
but that is very convoluted use sum
on logical values instead.
sum(x == 3, na.rm = TRUE)
#[1] 2
So try :
library(dplyr)
t1 %>%
group_by(year) %>%
mutate(YES = sum(characteristic == "1", na.rm = TRUE),
NO = sum(characteristic == "0", na.rm = TRUE))
rowSums but keeping NA values
If you have a variable number of columns you could try this approach:
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as @JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490
Sum of two Columns of Data Frame with NA Values
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7
Related Topics
Sort Columns of a Dataframe by Column Name
How to Round Up to the Nearest 10 (Or 100 or X)
Evaluating Both Column Name and the Target Value Within 'J' Expression Within 'Data.Table'
Operator == Inconsistent in Logical Columns in Data.Table
Remove Ids That Occur X Times R
What Is "Object of Type 'Closure' Is Not Subsettable" Error in Shiny
Cumulative Sum That Resets When 0 Is Encountered
How to Display All X Labels in R Barplot
R - Group by Variable and Then Assign a Unique Id
Drop-Down Checkbox Input in Shiny
How to One Hot Encode Several Categorical Variables in R
Changing Whisker Definition in Geom_Boxplot
R Command for Setting Working Directory to Source File Location in Rstudio
Sending Email in R via Outlook
What Is the Algorithm Behind R Core's 'Split' Function
Use Trycatch Skip to Next Value of Loop Upon Error