Row sums over columns with a certain pattern in their name
You may also try with Reduce
DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
# ref nb i1 i2 i3 i4 Sum
#1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
#2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
#3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
#4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
NOTE: If there are "NA" values, it should be replaced with '0' before Reduce
i.e.
DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x,
which(is.na(x)), 0))), .SDcols=listCol][]
**Another solution :**using rowSums
DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))]
Sum rows in columns with column names ending with specific character string (R)
You can use use select
to select columns that ends with "zscore"
and use rowSums
:
library(dplyr)
df1 %>%
group_by(a) %>%
mutate(across(b:d, list(zscore = ~as.numeric(scale(.))))) %>%
ungroup %>%
mutate(total = rowSums(select(., ends_with('zscore'))))
# A tibble: 30 x 8
# a b c d b_zscore c_zscore d_zscore total
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 a 7.17 14.8 8.45 0.697 0.101 0.0179 0.816
# 2 a 7.42 19.7 3.97 0.841 1.17 -1.14 0.865
# 3 a 5.78 19.2 9.66 -0.108 1.05 0.332 1.28
# 4 a 5.09 17.7 12.8 -0.508 0.732 1.14 1.36
# 5 a 7.21 12.9 6.24 0.721 -0.329 -0.555 -0.163
# 6 a 2.36 13.7 2.50 -2.09 -0.146 -1.52 -3.76
# 7 a 7.26 10.9 10.7 0.749 -0.774 0.593 0.567
# 8 a 5.45 6.18 12.8 -0.302 -1.80 1.14 -0.965
# 9 b 5.43 18.2 9.55 -0.445 1.12 1.34 2.02
#10 b 4.16 12.1 4.11 -1.06 0.0776 -1.02 -2.01
# … with 20 more rows
Row-wise sum for columns with certain names
We can select the columns that have 'a' with grep
, subset the columns and do rowSums
and the same with 'b' columns.
rowSums(df1[grep('a', names(df1)[-1])+1])
rowSums(df1[grep('b', names(df1)[-1])+1])
Build rowSums in dplyr based on columns containing pattern in their names
As you asked for a dplyr
solution, you can do:
library(dplyr)
df %>%
mutate(SUM = rowSums(select(., starts_with("COUNT"))))
USER OBSERVATION COUNT.1 COUNT.2 COUNT.3 SUM
1 A 1 0 1 1 2
2 A 2 1 1 2 4
3 A 3 3 0 0 3
Sum column with similar name
You could use
library(dplyr)
df %>%
mutate(across(starts_with("AB"),
~.x + df[[gsub("AB", "XB", cur_column())]],
.names = "sum_{.col}"))
This returns
# A tibble: 1 x 9
AB1 AB3 AB4 XB1 XB3 XB4 sum_AB1 sum_AB3 sum_AB4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 34 0 5 3 7 17 37 7
- We use
across
andmutate
in this approach. - First we select all columns starting with
AB
. The desired sums are alwaysABn + XB2
, so we can use this pattern. - Next we replace
AB
in the name of the current selected column withXB
and sum those two columns. These sums are stored in a new column prefixed withsum_
.
sum across multiple columns of a data frame based on multiple patterns R
We can reshape to 'long' format with pivot_longer
, and get the sum
while reshaping back to 'wide'
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(cols = starts_with("X"), names_to = "name1") %>%
mutate(name1 = str_remove(name1, "\\.\\d+$")) %>%
pivot_wider(names_from = name1, values_from = value,
values_fn = ~ sum(.x, na.rm = TRUE))
-output
# A tibble: 4 × 3
name X1990 X1991
<chr> <dbl> <dbl>
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15
Or in base R
, use split.default
to split the data into a list
of datasets based on the column name pattern, get the rowSums
and cbind
with the first column
cbind(df[1], sapply(split.default(df[-1],
trimws(names(df)[-1], whitespace = "\\.\\d+")), rowSums, na.rm = TRUE))
name X1990 X1991
1 name1 22 11
2 name2 37 35
3 name3 22 12
4 name4 20 15
R - Summing over a row for specific columns using a list
We can use rowSums
df$sum_genelist <- rowSums(df[intersect(genelist, names(df))], na.rm = TRUE)
df
# names wb01 wb02 wb03 wb04 wb05 wb06 sum_genelist
#a a 1 0 0 1 1 1 1
#b b 1 0 0 1 0 1 1
#c c 0 0 1 0 1 1 2
#d d 1 0 1 1 0 1 2
#e e 1 1 1 1 0 1 3
#f f 0 1 1 1 1 1 3
where
genelist <- c('wb02', 'wb03', 'wb06')
data
df <- structure(list(names = c("a", "b", "c", "d", "e", "f"), wb01 = c(1,
1, 0, 1, 1, 0), wb02 = c(0, 0, 0, 0, 1, 1), wb03 = c(0, 0, 1,
1, 1, 1), wb04 = c(1, 1, 0, 1, 1, 1), wb05 = c(1, 0, 1, 0, 0,
1), wb06 = c(1, 1, 1, 1, 1, 1)), row.names = c("a", "b", "c",
"d", "e", "f"), class = "data.frame")
Sum all columns whose names start with a pattern, by group
You can store the patterns in a vector and loop through them. With your example you can use something like this:
patterns <- unique(substr(names(DT), 1, 3)) # store patterns in a vector
new <- sapply(patterns, function(xx) rowSums(DT[,grep(xx, names(DT)), drop=FALSE])) # loop through
# a01 a02 a03
#[1,] 20 30 50
#[2,] 50 20 0
#[3,] 20 20 20
#[4,] 10 20 0
You can adjust the names like this:
colnames(new) <- paste0(colnames(new), "tot") # rename
Related Topics
R Script - How to Continue Code Execution on Error
How to Check If a Column Is a Date in R
Example Needed: Change the Default Print Method of an Object
Splitting a Data Frame into Equal Parts
In R, What Does a Negative Index Do
Create Sparse Matrix from a Data Frame
Shiny: Merge Cells in Dt::Datatable
Select Along One of N Dimensions in Array
Identify Records in Data Frame a Not Contained in Data Frame B
Change Path.Expand Location (Win 7)
Get Decision Tree Rule/Path Pattern for Every Row of Predicted Dataset for Rpart/Ctree Package in R
Appending a List to a List of Lists in R
Plotting Multiple Time Series on the Same Plot Using Ggplot()
Two-Way Density Plot Combined with One Way Density Plot with Selected Regions in R
How to Add Elements to a List in R (Loop)