Row Sums Over Columns with a Certain Pattern in Their Name

Row sums over columns with a certain pattern in their name

You may also try with Reduce

 DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
# ref nb i1 i2 i3 i4 Sum
#1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
#2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
#3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
#4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079

NOTE: If there are "NA" values, it should be replaced with '0' before Reduce i.e.

 DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x, 
which(is.na(x)), 0))), .SDcols=listCol][]

**Another solution :**using rowSums

 DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))] 

Sum rows in columns with column names ending with specific character string (R)

You can use use select to select columns that ends with "zscore" and use rowSums :

library(dplyr)
df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = rowSums(select(., ends_with('zscore'))))

# A tibble: 30 x 8
# a b c d b_zscore c_zscore d_zscore total
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 7.17 14.8 8.45 0.697 0.101 0.0179 0.816
# 2 a 7.42 19.7 3.97 0.841 1.17 -1.14 0.865
# 3 a 5.78 19.2 9.66 -0.108 1.05 0.332 1.28
# 4 a 5.09 17.7 12.8 -0.508 0.732 1.14 1.36
# 5 a 7.21 12.9 6.24 0.721 -0.329 -0.555 -0.163
# 6 a 2.36 13.7 2.50 -2.09 -0.146 -1.52 -3.76
# 7 a 7.26 10.9 10.7 0.749 -0.774 0.593 0.567
# 8 a 5.45 6.18 12.8 -0.302 -1.80 1.14 -0.965
# 9 b 5.43 18.2 9.55 -0.445 1.12 1.34 2.02
#10 b 4.16 12.1 4.11 -1.06 0.0776 -1.02 -2.01
# … with 20 more rows

Row-wise sum for columns with certain names

We can select the columns that have 'a' with grep, subset the columns and do rowSums and the same with 'b' columns.

 rowSums(df1[grep('a', names(df1)[-1])+1])
rowSums(df1[grep('b', names(df1)[-1])+1])

Build rowSums in dplyr based on columns containing pattern in their names

As you asked for a dplyr solution, you can do:

library(dplyr)

df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))

USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3

Sum column with similar name

You could use

library(dplyr)
df %>%
mutate(across(starts_with("AB"),
~.x + df[[gsub("AB", "XB", cur_column())]],
.names = "sum_{.col}"))

This returns

# A tibble: 1 x 9
AB1 AB3 AB4 XB1 XB3 XB4 sum_AB1 sum_AB3 sum_AB4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 34 0 5 3 7 17 37 7
  • We use across and mutate in this approach.
  • First we select all columns starting with AB. The desired sums are always ABn + XB2, so we can use this pattern.
  • Next we replace AB in the name of the current selected column with XB and sum those two columns. These sums are stored in a new column prefixed with sum_.

sum across multiple columns of a data frame based on multiple patterns R

We can reshape to 'long' format with pivot_longer, and get the sum while reshaping back to 'wide'

library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "name1") %>%
mutate(name1 = str_remove(name1, "\\.\\d+$")) %>%
pivot_wider(names_from = name1, values_from = value,
values_fn = ~ sum(.x, na.rm = TRUE))

-output

# A tibble: 4 × 3
name X1990 X1991
<chr> <dbl> <dbl>
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15

Or in base R, use split.default to split the data into a list of datasets based on the column name pattern, get the rowSums and cbind with the first column

cbind(df[1], sapply(split.default(df[-1], 
trimws(names(df)[-1], whitespace = "\\.\\d+")), rowSums, na.rm = TRUE))
name X1990 X1991
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15

R - Summing over a row for specific columns using a list

We can use rowSums

df$sum_genelist <- rowSums(df[intersect(genelist, names(df))], na.rm = TRUE)
df
# names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#a a 1 0 0 1 1 1 1
#b b 1 0 0 1 0 1 1
#c c 0 0 1 0 1 1 2
#d d 1 0 1 1 0 1 2
#e e 1 1 1 1 0 1 3
#f f 0 1 1 1 1 1 3

where

genelist <- c('wb02', 'wb03', 'wb06')

data

df <- structure(list(names = c("a", "b", "c", "d", "e", "f"), wb01 = c(1, 
1, 0, 1, 1, 0), wb02 = c(0, 0, 0, 0, 1, 1), wb03 = c(0, 0, 1,
1, 1, 1), wb04 = c(1, 1, 0, 1, 1, 1), wb05 = c(1, 0, 1, 0, 0,
1), wb06 = c(1, 1, 1, 1, 1, 1)), row.names = c("a", "b", "c",
"d", "e", "f"), class = "data.frame")

Sum all columns whose names start with a pattern, by group

You can store the patterns in a vector and loop through them. With your example you can use something like this:

patterns <- unique(substr(names(DT), 1, 3))  # store patterns in a vector
new <- sapply(patterns, function(xx) rowSums(DT[,grep(xx, names(DT)), drop=FALSE])) # loop through
# a01 a02 a03
#[1,] 20 30 50
#[2,] 50 20 0
#[3,] 20 20 20
#[4,] 10 20 0

You can adjust the names like this:

colnames(new) <- paste0(colnames(new), "tot")  # rename


Related Topics



Leave a reply



Submit