Rowsums But Keeping Na Values

rowSums but keeping NA values

If you have a variable number of columns you could try this approach:

mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as @JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490

rowSums with all NA

Here is one option:

rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0)
# [1] 2 2 NA 1 3 1

This uses that anything ^ 0 equals 1 in R.

error in calculating rowsum of column having NA values

To select specific columns use rowSums in select :

library(dplyr)

df %>% mutate(x1 = ifelse(is.na(T_1_1) & is.na(S_2_1),NA,
rowSums(select(., c(T_1_1,S_2_1)),na.rm = TRUE)))

# T_1_1 T_1_2 T_1_3 S_2_1 S_2_2 S_2_3 T_1_0 x1
#1 68 26 93 69 87 150 79 137
#2 NA NA 32 67 67 0 0 67
#3 0 0 NA 94 NA NA 0 94
#4 105 73 103 0 120 121 NA 105
#5 NA NA NA NA NA NA 98 NA
#6 0 97 0 136 122 78 NA 136
#7 135 46 147 NA 0 109 15 135
#8 NA NA NA 92 NA NA NA 92
#9 24 0 139 73 79 0 2 97

Mutate row sum but only if NA count is 2 or less

In base R, we can use rowSums twice, 1st to count sum of values in each row and second to count number of NA's in R.

ifelse(rowSums(is.na(df[-1])) <= 2, rowSums(df[-1], na.rm = TRUE), NA)
#[1] NA 17 29 NA 3 NA

Using dplyr row-wise you can do this as :

library(dplyr)
df %>%
rowwise() %>%
mutate(col = ifelse(sum(is.na(c_across(v2:v6))) <= 2,
sum(c_across(v2:v6), na.rm = TRUE), NA))

# A tibble: 6 x 7
# v1 v2 v3 v4 v5 v6 col
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 4 7 NA NA NA NA
#2 B NA 8 3 3 3 17
#3 C 5 9 5 5 5 29
#4 D 6 NA NA NA NA NA
#5 E NA NA 1 1 1 3
#6 F NA NA 4 NA 4 NA

Shortened the code using ifelse suggestion from @rpolicastro.

summing across rows, leaving NAs in R

use this to get total and then cbind it with your dataframe .

apply(df,1,function(x){if (sum(is.na(x)) == length(x)){
return(NA)
}else{
sum(x,na.rm = T)
}
})

ignore NA in dplyr row sum

You could use this:

library(dplyr)
data %>%
#rowwise will make sure the sum operation will occur on each row
rowwise() %>%
#then a simple sum(..., na.rm=TRUE) is enough to result in what you need
mutate(sum = sum(a,b,c, na.rm=TRUE))

Output:

Source: local data frame [4 x 4]
Groups: <by row>

a b c sum
(dbl) (dbl) (dbl) (dbl)
1 1 4 7 12
2 2 NA 8 10
3 3 5 9 17
4 4 6 NA 10

RowSums NA + NA gives 0

One option with rowSums would be to get the rowSums with na.rm=TRUE and multiply with the negated (!) rowSums of negated (!) logical matrix based on the NA values after converting the rows that have all NAs into NA (NA^)

rowSums(df, na.rm=TRUE) *NA^!rowSums(!is.na(df))
#[1] 2 NA 10

Sum of two Columns of Data Frame with NA Values

dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7

Filter data.frame with all colums NA but keep when some are NA

We can use base R

teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4

Or using apply and any

teste[apply(!is.na(teste), 1, any),]

which can be also used within filter

teste %>%
filter(rowSums(!is.na(.)) >0)

Or using c_across from dplyr, we can directly remove the rows with all NA

library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4

NOTE: filter_all is getting deprecated

How to keep only max value of row and convert other value to NA?

We can use apply to loop over the rows (MARGIN = 1) and replace the values that are not equal to max with NA, assign the transpose back to the original object

df[] <- t(apply(df, 1, function(x) replace(x, x != max(x, na.rm = TRUE), NA)))

Or with rowMaxs

library(matrixStats)
i1 <- !!rowSums(!is.na(df))
df[i1,] <- replace(df[i1,], df[i1,] != rowMaxs(as.matrix(df[i1,]),
na.rm = TRUE)[col(df[i1,])], NA)

Or using dplyr

library(dplyr)
library(purrr)
df %>%
mutate(new = reduce(., pmax, na.rm = TRUE)) %>%
transmute_at(vars(starts_with('col')), ~ replace(., .!= new, NA))


Related Topics



Leave a reply



Submit