Row-wise sum for columns with certain names
We can select the columns that have 'a' with grep
, subset the columns and do rowSums
and the same with 'b' columns.
rowSums(df1[grep('a', names(df1)[-1])+1])
rowSums(df1[grep('b', names(df1)[-1])+1])
Sum column with similar name
You could use
library(dplyr)
df %>%
mutate(across(starts_with("AB"),
~.x + df[[gsub("AB", "XB", cur_column())]],
.names = "sum_{.col}"))
This returns
# A tibble: 1 x 9
AB1 AB3 AB4 XB1 XB3 XB4 sum_AB1 sum_AB3 sum_AB4
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 12 34 0 5 3 7 17 37 7
- We use
across
andmutate
in this approach. - First we select all columns starting with
AB
. The desired sums are alwaysABn + XB2
, so we can use this pattern. - Next we replace
AB
in the name of the current selected column withXB
and sum those two columns. These sums are stored in a new column prefixed withsum_
.
Row-wise sum of values grouped by columns with same name
We can transpose dat
, calculate rowsum
per group (colnames
of the original dat
), then transpose the result back to original structure.
t(rowsum(t(dat), group = colnames(dat), na.rm = T))
# A C G T
#1 1 0 1 0
#2 4 0 6 0
#3 0 1 0 1
#4 2 0 1 0
#5 1 0 1 0
#6 0 1 0 1
#7 0 1 0 1
Sum row-wise values that are grouped by column name but keep all columns in R?
You can try ave
like below (with aids of col
+ row
)
> ave(myMat,colnames(myMat)[col(myMat)], row(myMat), FUN = sum)
x y x y
[1,] 1 3 1 3
[2,] 5 9 5 9
[3,] 4 13 4 13
rowwise() sum with vector of column names in dplyr
I think you are looking for rlang::syms
to coerce strings to quosures:
library(dplyr)
library(rlang)
df %>%
rowwise() %>%
mutate(
sum = sum(!!!syms(to_sum))
)
# foo bar foobar sum
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 0 2
# 2 0 1 1 1
# 3 1 1 1 2
Sum if column name is higher than row value
We can check with np.greater_equal.outer
, then slice the column mask the unwanted cell with boolean output as NaN
s = pd.to_datetime(df.date).values
m = np.greater_equal.outer(pd.to_datetime(df.columns[:-1]).values,s).T
df = df.append(df.iloc[:,:-1].where(m).sum().to_frame('Total').T)
df
Out[381]:
01-01-2020 01-01-2021 01-01-2022 date
1 1.0 3.0 6.0 01-01-2020
2 4.0 4.0 2.0 01-10-2021
3 5.0 1.0 9.0 01-12-2021
Total 1.0 3.0 17.0 NaN
How to sum same columns (differentiated by suffix) in pandas?
Somehow we need to get an Index
of columns so pairs of columns share the same name, then we can groupby sum
on axis=1
:
cols = pd.Index(['total_customers', 'total_customers',
'total_purchases', 'total_purchases'])
result_df = df.groupby(cols, axis=1).sum()
With the shown example, we can str.replace
an optional s
, followed by underscore, followed by the date format (four numbers-
two numbers-
two numbers) with a single s
. This pattern may need modified depending on the actual column names:
cols = df.columns.str.replace(r's?_\d{4}-\d{2}-\d{2}$', 's', regex=True)
result_df = df.groupby(cols, axis=1).sum()
result_df
:
total_customers total_purchases
0 11 10
1 17 5
Setup and imports:
import pandas as pd
df = pd.DataFrame({
'total_customers': [1, 3],
'total_customer_2021-03-31': [10, 14],
'total_purchases': [4, 3],
'total_purchases_2021-03-31': [6, 2]
})
Related Topics
Reshape Data from Long to Wide Format - More Than One Variable
Handling Missing Combinations of Factors in R
Increase Space Between Legend Keys Without Increasing Legend Keys
How to Format the X-Axis of the Hard Coded Plotting Function of Spei Package in R
How to Color Bar Plots When Using ..Prop.. in Ggplot
In R Data.Frame, Promote Rownames to Actual Column
How to Change Gender Factor into an Numerical Coding in R
How to Configure R-3.0.1 with --Enable-R-Shlib
Variable Results with Dplyr Summarise, Depending on Output Variable Naming
Logistic Regression: How to Try Every Combination of Predictors in R
Adding a Legend to an Rgl 3D Plot
Bar Plot for Count Data by Group in R
Sum Columns Row-Wise with Similar Names
Filtering Multiple Columns with Str_Detect
Why Does Nls Function Not Work in Ggplot2
Dataframe Is Subseted by Row Number and Not by Cell Value After Clicking on Dt::Datatable
Replace Column Values with Column Name Using Dplyr's Transmute_All