rowSums but keeping NA values
If you have a variable number of columns you could try this approach:
mm <- merge(dd1,dd2)
mm$m <- rowSums(mm, na.rm=TRUE) * ifelse(rowSums(is.na(mm)) == ncol(mm), NA, 1)
# or, as @JoshuaUlrich commented:
#mm$m <- ifelse(apply(is.na(mm),1,all),NA,rowSums(mm,na.rm=TRUE))
tail(mm, 10)
# dd1 dd2 m
#2013-08-02 NA NA NA
#2013-08-03 NA NA NA
#2013-08-04 NA NA NA
#2013-08-05 1.2542692 -1.2542692 0.000000
#2013-08-06 NA 1.3325804 1.332580
#2013-08-07 NA 0.7726740 0.772674
#2013-08-08 0.8158402 -0.8158402 0.000000
#2013-08-09 NA 1.2292919 1.229292
#2013-08-10 NA NA NA
#2013-08-11 NA 0.9334900 0.933490
rowSums with all NA
Here is one option:
rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0)
# [1] 2 2 NA 1 3 1
This uses that anything ^ 0
equals 1 in R.
error in calculating rowsum of column having NA values
To select specific columns use rowSums
in select
:
library(dplyr)
df %>% mutate(x1 = ifelse(is.na(T_1_1) & is.na(S_2_1),NA,
rowSums(select(., c(T_1_1,S_2_1)),na.rm = TRUE)))
# T_1_1 T_1_2 T_1_3 S_2_1 S_2_2 S_2_3 T_1_0 x1
#1 68 26 93 69 87 150 79 137
#2 NA NA 32 67 67 0 0 67
#3 0 0 NA 94 NA NA 0 94
#4 105 73 103 0 120 121 NA 105
#5 NA NA NA NA NA NA 98 NA
#6 0 97 0 136 122 78 NA 136
#7 135 46 147 NA 0 109 15 135
#8 NA NA NA 92 NA NA NA 92
#9 24 0 139 73 79 0 2 97
Mutate row sum but only if NA count is 2 or less
In base R, we can use rowSums
twice, 1st to count sum of values in each row and second to count number of NA
's in R.
ifelse(rowSums(is.na(df[-1])) <= 2, rowSums(df[-1], na.rm = TRUE), NA)
#[1] NA 17 29 NA 3 NA
Using dplyr
row-wise you can do this as :
library(dplyr)
df %>%
rowwise() %>%
mutate(col = ifelse(sum(is.na(c_across(v2:v6))) <= 2,
sum(c_across(v2:v6), na.rm = TRUE), NA))
# A tibble: 6 x 7
# v1 v2 v3 v4 v5 v6 col
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 A 4 7 NA NA NA NA
#2 B NA 8 3 3 3 17
#3 C 5 9 5 5 5 29
#4 D 6 NA NA NA NA NA
#5 E NA NA 1 1 1 3
#6 F NA NA 4 NA 4 NA
Shortened the code using ifelse
suggestion from @rpolicastro.
summing across rows, leaving NAs in R
use this to get total and then cbind
it with your dataframe .
apply(df,1,function(x){if (sum(is.na(x)) == length(x)){
return(NA)
}else{
sum(x,na.rm = T)
}
})
ignore NA in dplyr row sum
You could use this:
library(dplyr)
data %>%
#rowwise will make sure the sum operation will occur on each row
rowwise() %>%
#then a simple sum(..., na.rm=TRUE) is enough to result in what you need
mutate(sum = sum(a,b,c, na.rm=TRUE))
Output:
Source: local data frame [4 x 4]
Groups: <by row>
a b c sum
(dbl) (dbl) (dbl) (dbl)
1 1 4 7 12
2 2 NA 8 10
3 3 5 9 17
4 4 6 NA 10
RowSums NA + NA gives 0
One option with rowSums
would be to get the rowSums
with na.rm=TRUE
and multiply with the negated (!
) rowSums
of negated (!
) logical matrix based on the NA values after converting the rows that have all NAs into NA (NA^
)
rowSums(df, na.rm=TRUE) *NA^!rowSums(!is.na(df))
#[1] 2 NA 10
Sum of two Columns of Data Frame with NA Values
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7
Filter data.frame with all colums NA but keep when some are NA
We can use base R
teste[rowSums(!is.na(teste)) >0,]
# a b c
#1 1 NA 1
#3 3 3 3
#4 NA 4 4
Or using apply
and any
teste[apply(!is.na(teste), 1, any),]
which can be also used within filter
teste %>%
filter(rowSums(!is.na(.)) >0)
Or using c_across
from dplyr
, we can directly remove the rows with all
NA
library(dplyr)
teste %>%
rowwise %>%
filter(!all(is.na(c_across(everything()))))
# A tibble: 3 x 3
# Rowwise:
# a b c
# <dbl> <dbl> <dbl>
#1 1 NA 1
#2 3 3 3
#3 NA 4 4
NOTE: filter_all
is getting deprecated
How to keep only max value of row and convert other value to NA?
We can use apply
to loop over the rows (MARGIN = 1
) and replace
the values that are not equal to max
with NA
, assign the transpose back to the original object
df[] <- t(apply(df, 1, function(x) replace(x, x != max(x, na.rm = TRUE), NA)))
Or with rowMaxs
library(matrixStats)
i1 <- !!rowSums(!is.na(df))
df[i1,] <- replace(df[i1,], df[i1,] != rowMaxs(as.matrix(df[i1,]),
na.rm = TRUE)[col(df[i1,])], NA)
Or using dplyr
library(dplyr)
library(purrr)
df %>%
mutate(new = reduce(., pmax, na.rm = TRUE)) %>%
transmute_at(vars(starts_with('col')), ~ replace(., .!= new, NA))
Related Topics
How to Automatically Load Data in an R Package
How to Change Name of Factor Levels
Adding a Simple Lm Trend Line to a Ggplot Boxplot
Convert a Mm-Yy String "Jan-01" into Date Format
How to Suppress Warnings from Stats:::Regularize.Values
R: Interpolation of Nas by Group
Using Variable Value as Column Name in Data.Frame or Cbind
Print Tibble with Column Breaks as in V1.3.0
How to Use Tidyr to Fill in Completed Rows Within Each Value of a Grouping Variable
Download Plotly Using Downloadhandler
Importing Data into R (Rdata) from Github
Missing Data When Supplying a Dual-Axis--Multiple-Traces to Subplot
How to See All Rows of a Data Frame in a Jupyter Notebook with an R Kernel
Differencebetween Short (&,|) and Long (&&, ||) Forms of And, or Logical Operators in R