Summing Across Rows of a Data.Table for Specific Columns

Summing across rows of a data.table for specific columns

[ Edited 2020-02-15 to reflect current state of data.table ] In recent versions of data.table rowSums(Abundance[ , 4:6]) works as OP originally expected. Here are some alternatives:

Abundance[, SumAbundance := rowSums(.SD), .SDcols = 4:6]

Also, I didn't check, but I have a suspicion this will be faster, since it will not convert to matrix as rowSums does:

Abundance[, SumAbundance := Reduce(`+`, .SD), .SDcol = 4:6]

Summing across rows of a data.table for specific columns with NA

We can have several options for this i.e. either do the rowSums first and then replace the rows where all are NA or create an index in i to do the sum only for those rows with at least one non-NA.

library(data.table)
TEST[, SumAbundance := replace(rowSums(.SD, na.rm = TRUE),
Reduce(`&`, lapply(.SD, is.na)), NA), .SDcols = 4:6]

Or slightly more compact option

TEST[, SumAbundance :=  (NA^!rowSums(!is.na(.SD))) * 
rowSums(.SD, na.rm = TRUE), .SDcols = 4:6]

Or construct a function and reuse

rowSums_new <- function(dat) {
fifelse(rowSums(is.na(dat)) != ncol(dat), rowSums(dat, na.rm = TRUE), NA_real_)
}
TEST[, SumAbundance := rowSums_new(.SD), .SDcols = 4:6]

Row sums over columns with a certain pattern in their name

You may also try with Reduce

 DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
# ref nb i1 i2 i3 i4 Sum
#1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
#2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
#3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
#4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079

NOTE: If there are "NA" values, it should be replaced with '0' before Reduce i.e.

 DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x, 
which(is.na(x)), 0))), .SDcols=listCol][]

**Another solution :**using rowSums

 DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))] 

data.table sum of all colums by group

I think the code you're looking for is likely:

TestData[, .(a = sum(.SD)), by = .(id, year), .SDcols = Kattegori_Henter("Medicine")]

R data.table calculate sum of other rows

cols = c('high', 'low')
lapply(
seq_len(nrow(df)),
\(i) matrix(c(unlist(df[i, cols]), colSums(df[-i, cols])), nrow = 2, byrow=TRUE)
)

[[1]]
[,1] [,2]
[1,] 73 77
[2,] 200 218

[[2]]
[,1] [,2]
[1,] 113 155
[2,] 160 140

[[3]]
[,1] [,2]
[1,] 87 63
[2,] 186 232

Data

df = data.frame(genotypes =  c('A|A', 'A|G', 'G|G'), high = c(73, 113, 87), low = c(77, 155, 63))

Sum rows in columns with column names ending with specific character string (R)

You can use use select to select columns that ends with "zscore" and use rowSums :

library(dplyr)
df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = rowSums(select(., ends_with('zscore'))))

# A tibble: 30 x 8
# a b c d b_zscore c_zscore d_zscore total
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 7.17 14.8 8.45 0.697 0.101 0.0179 0.816
# 2 a 7.42 19.7 3.97 0.841 1.17 -1.14 0.865
# 3 a 5.78 19.2 9.66 -0.108 1.05 0.332 1.28
# 4 a 5.09 17.7 12.8 -0.508 0.732 1.14 1.36
# 5 a 7.21 12.9 6.24 0.721 -0.329 -0.555 -0.163
# 6 a 2.36 13.7 2.50 -2.09 -0.146 -1.52 -3.76
# 7 a 7.26 10.9 10.7 0.749 -0.774 0.593 0.567
# 8 a 5.45 6.18 12.8 -0.302 -1.80 1.14 -0.965
# 9 b 5.43 18.2 9.55 -0.445 1.12 1.34 2.02
#10 b 4.16 12.1 4.11 -1.06 0.0776 -1.02 -2.01
# … with 20 more rows


Related Topics



Leave a reply



Submit