Sum Columns Row-Wise with Similar Names

Row-wise sum for columns with certain names

We can select the columns that have 'a' with grep, subset the columns and do rowSums and the same with 'b' columns.

 rowSums(df1[grep('a', names(df1)[-1])+1])
rowSums(df1[grep('b', names(df1)[-1])+1])

Sum column with similar name

You could use

library(dplyr)
df %>%
mutate(across(starts_with("AB"),
~.x + df[[gsub("AB", "XB", cur_column())]],
.names = "sum_{.col}"))

This returns

# A tibble: 1 x 9
AB1 AB3 AB4 XB1 XB3 XB4 sum_AB1 sum_AB3 sum_AB4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 34 0 5 3 7 17 37 7
  • We use across and mutate in this approach.
  • First we select all columns starting with AB. The desired sums are always ABn + XB2, so we can use this pattern.
  • Next we replace AB in the name of the current selected column with XB and sum those two columns. These sums are stored in a new column prefixed with sum_.

Row-wise sum of values grouped by columns with same name

We can transpose dat , calculate rowsum per group (colnames of the original dat), then transpose the result back to original structure.

t(rowsum(t(dat), group = colnames(dat), na.rm = T))
# A C G T
#1 1 0 1 0
#2 4 0 6 0
#3 0 1 0 1
#4 2 0 1 0
#5 1 0 1 0
#6 0 1 0 1
#7 0 1 0 1

Sum row-wise values that are grouped by column name but keep all columns in R?

You can try ave like below (with aids of col + row)

> ave(myMat,colnames(myMat)[col(myMat)], row(myMat), FUN = sum)
x y x y
[1,] 1 3 1 3
[2,] 5 9 5 9
[3,] 4 13 4 13

rowwise() sum with vector of column names in dplyr

I think you are looking for rlang::syms to coerce strings to quosures:

library(dplyr)
library(rlang)
df %>%
rowwise() %>%
mutate(
sum = sum(!!!syms(to_sum))
)
# foo bar foobar sum
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 2
# 2 0 1 1 1
# 3 1 1 1 2

Sum if column name is higher than row value

We can check with np.greater_equal.outer, then slice the column mask the unwanted cell with boolean output as NaN

s = pd.to_datetime(df.date).values
m = np.greater_equal.outer(pd.to_datetime(df.columns[:-1]).values,s).T
df = df.append(df.iloc[:,:-1].where(m).sum().to_frame('Total').T)
df
Out[381]:
01-01-2020 01-01-2021 01-01-2022 date
1 1.0 3.0 6.0 01-01-2020
2 4.0 4.0 2.0 01-10-2021
3 5.0 1.0 9.0 01-12-2021
Total 1.0 3.0 17.0 NaN

How to sum same columns (differentiated by suffix) in pandas?

Somehow we need to get an Index of columns so pairs of columns share the same name, then we can groupby sum on axis=1:

cols = pd.Index(['total_customers', 'total_customers',
'total_purchases', 'total_purchases'])

result_df = df.groupby(cols, axis=1).sum()

With the shown example, we can str.replace an optional s, followed by underscore, followed by the date format (four numbers-two numbers-two numbers) with a single s. This pattern may need modified depending on the actual column names:

cols = df.columns.str.replace(r's?_\d{4}-\d{2}-\d{2}$', 's', regex=True)
result_df = df.groupby(cols, axis=1).sum()

result_df:

   total_customers  total_purchases
0 11 10
1 17 5

Setup and imports:

import pandas as pd

df = pd.DataFrame({
'total_customers': [1, 3],
'total_customer_2021-03-31': [10, 14],
'total_purchases': [4, 3],
'total_purchases_2021-03-31': [6, 2]
})


Related Topics



Leave a reply



Submit