Using `:=` in data.table to sum the values of two columns in R, ignoring NAs
It's not a lack of understanding of data.table but rather one regarding vectorized functions in R. You can define a dyadic operator that will behave differently than the "+" operator with regard to missing values:
`%+na%` <- function(x,y) {ifelse( is.na(x), y, ifelse( is.na(y), x, x+y) )}
mat[ , col3:= col1 %+na% col2]
#-------------------------------
col1 col2 col3
1: NA 0.003745 0.003745
2: 0.000000 0.007463 0.007463
3: -0.015038 -0.007407 -0.022445
4: 0.003817 -0.003731 0.000086
5: -0.011407 -0.007491 -0.018898
You can use mrdwad's comment to do it with sum(... , na.rm=TRUE
):
mat[ , col4 := sum(col1, col2, na.rm=TRUE), by=1:NROW(mat)]
Sum 2 columns, ignore NA, except when both are NA
I used the following. It gives sums even when there are NAs, but returns NA when all sumed elements are NA.
rowSums(df, na.rm = TRUE) * NA ^ (rowSums(!is.na(df)) == 0)
Sum of two Columns of Data Frame with NA Values
dat$e <- rowSums(dat[,c("b", "c")], na.rm=TRUE)
dat
# a b c d e
# 1 1 2 3 4 5
# 2 5 NA 7 8 7
Skip NAs when using Reduce() in data.table
Consider this example :
library(data.table)
dt <- data.table(a = 1:5, b = c(3, NA, 1, 2, 4), c = c(NA, 1, NA, 3, 4))
dt
# a b c
#1: 1 3 NA
#2: 2 NA 1
#3: 3 1 NA
#4: 4 2 3
#5: 5 4 4
If you want to carry previous value to NA
values you can use :
dt[, names(dt) := lapply(.SD, function(x) cumsum(replace(x, is.na(x), 0))),
.SDcols = names(dt)]
dt
# a b c
#1: 1 3 0
#2: 3 3 1
#3: 6 4 1
#4: 10 6 4
#5: 15 10 8
If you want to keep NA
as NA
:
dt[, names(dt) := lapply(.SD, function(x) {
x1 <- cumsum(replace(x, is.na(x), 0))
x1[is.na(x)] <- NA
x1
}), .SDcols = names(dt)]
dt
# a b c
#1: 1 3 NA
#2: 3 NA 1
#3: 6 4 NA
#4: 10 6 4
#5: 15 10 8
Summing across rows of a data.table for specific columns with NA
We can have several options for this i.e. either do the rowSums
first and then replace
the rows where all are NA
or create an index in i
to do the sum only for those rows with at least one non-NA.
library(data.table)
TEST[, SumAbundance := replace(rowSums(.SD, na.rm = TRUE),
Reduce(`&`, lapply(.SD, is.na)), NA), .SDcols = 4:6]
Or slightly more compact option
TEST[, SumAbundance := (NA^!rowSums(!is.na(.SD))) *
rowSums(.SD, na.rm = TRUE), .SDcols = 4:6]
Or construct a function and reuse
rowSums_new <- function(dat) {
fifelse(rowSums(is.na(dat)) != ncol(dat), rowSums(dat, na.rm = TRUE), NA_real_)
}
TEST[, SumAbundance := rowSums_new(.SD), .SDcols = 4:6]
Summing many columns with data.table in R, remove NA
First, create the object variables for the names in use:
colsToSum <- names(dt1) # or whatever you need
summedNms <- paste0( "y", seq_along(colsToSum) )
If you'd like to copy it to a new data.table
dt2 <- dt1[, lapply(.SD, sum, na.rm=TRUE), .SDcols=colsToSum]
setnames(dt2, summedNms)
If alternatively, youd like to append the columns to the original
dt1[, c(summedNms) := lapply(.SD, sum, na.rm=TRUE), .SDcols=colsToSum]
As far as a general na.rm
process, there is not one specific to data.table
, but have a look at ?na.omit
and ?na.exclude
Sum values from rows ignoring certain values in R
One way to do it in base
:
rowSums(dta[, 2:4] * (dta[, 2:4] < 7))
# [1] 0 4 2 2 NA 9
Adding explanation, according to @tjebo comment
- With
dta[, 2:4] < 7
you produce a dataframe populated withlogical
values, whereTRUE
orFALSE
corresponds to the values which are less or greater than7
. It is possible to do in one line, since this operation is vectorized; - Than, you multiply above logical dataframe, and a dataframe populated with your original values. Under the hood, R converts
logical
types intonumeric
types, so allFALSE
andTRUE
s from your logical dataset, are converted to0
s and1
s. Which means that you multiply your original values by1
if they are less than7
, and by0
s otherwise; - Since
NA < 7
producesNA
, and following multiplication byNA
will produceNA
s as well - you preserve the originalNA
s; - Last step is to call
rowSums()
on a resulting dataframe, which will sum up the values for each particular row. Since those of them that exceed7
are turned into0
s, you exclude them from resulting sum; - In case, when you want to get a sum for the rows where at least one value is not
NA
, you can usena.rm = TRUE
argument to yourrowSums()
call. However, in this case, for the rows withNA
s only you will get0
.
How to sum values from two adjacent columns in a data.frame in R but keep 0s as such?
Here's a pretty simple way. We do a cumulative sum by row, and multiply by the original data frame -- multiplying by 0 zeros out the 0 entries, and multiplying by 1 keeps the summed entries as-is. Since you have quotes around your numbers making them character
class, we start by converting all your columns to numeric
:
df[] = lapply(df, as.numeric)
result = t(apply(df, 1, cumsum)) * df
result
# Year1 Year2 Year3 Year4 Year5 Year6
# 1 1 2 3 0 0 0
# 2 0 1 2 3 0 0
# 3 0 1 2 3 4 0
# 4 0 0 1 2 3 0
# 5 0 0 1 2 3 0
# 6 0 0 0 1 2 0
Related Topics
Create Convex Hull Polygon from Points and Save as Shapefile
Get Map with Specified Boundary Coordinates
Shiny - Can Dynamically Generated Buttons Act as Trigger for an Event
Adding Curved Flight Path Using R's Leaflet Package
Force Ggplot2 Scatter Plot to Be Square Shaped
Differencebetween Names and Colnames
Figure Captions, References Using Knitr and Markdown to HTML
How to Get Geom_Vline to Honor Facet_Wrap
Colorize Parts of the Title in a Plot
Setting the Color for an Individual Data Point
Create a Formula in a Data.Table Environment in R
Using Lapply to Change Column Names of a List of Data Frames
How to Use the Row.Names Attribute to Order the Rows of My Dataframe in R
How Make 2 Column Layout in R Markdown When Rendering PDF
Include Data Examples in Developing R Packages
Could Not Find Function Inside Foreach Loop