Row-Wise Sum of Values Grouped by Columns with Same Name

Row-wise sum of values grouped by columns with same name

We can transpose dat , calculate rowsum per group (colnames of the original dat), then transpose the result back to original structure.

t(rowsum(t(dat), group = colnames(dat), na.rm = T))
#  A C G T
#1 1 0 1 0
#2 4 0 6 0
#3 0 1 0 1
#4 2 0 1 0
#5 1 0 1 0
#6 0 1 0 1
#7 0 1 0 1

Sum row-wise values that are grouped by column name but keep all columns in R?

You can try ave like below (with aids of col + row)

> ave(myMat,colnames(myMat)[col(myMat)], row(myMat), FUN = sum)
     x  y x  y
[1,] 1  3 1  3
[2,] 5  9 5  9
[3,] 4 13 4 13

How do I sum all observations row-wise for all columns that have the same name?

We can split the dataset based on the names of the dataset and apply the rowSums on the list of datasets with same name

do.call(cbind, lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))
#    BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA  PEEL TORONTO  YORK
#[1,]  2943              2167  11874    11155     951    8598 32593   60779 23852
#[2,]  2822              2053  11817    11850    1040    8745 34110   67638 24678
#[3,]  2943              2179  11579    12051    1051    8853 34387   70146 24566

Or as @thelatemail mentioned, if we need a data.frame output, wrap the list output with data.frame

data.frame(lapply(split.default(dfN, names(dfN)), rowSums, na.rm = TRUE))

Or using tidyverse

library(tidyverse)
dfN %>% 
   split.default(names(.))  %>% 
   map_df(reduce, `+`)
# A tibble: 3 x 9
#  BRANT HALDIMAND.NORFOLK HALTON HAMILTON MUSKOKA NIAGARA  PEEL TORONTO  YORK
#  <int>             <int>  <int>    <int>   <int>   <int> <int>   <int> <int>
#1  2943              2167  11874    11155     951    8598 32593   60779 23852
#2  2822              2053  11817    11850    1040    8745 34110   67638 24678
#3  2943              2179  11579    12051    1051    8853 34387   70146 24566

data

dfN <- structure(list(TORONTO = c(20855L, 23281L, 24130L), HALTON = c(4011L, 
3997L, 3900L), PEEL = c(11178L, 11770L, 11810L), YORK = c(8138L, 
8417L, 8306L), BRANT = c(996L, 961L, 972L), HALDIMAND.NORFOLK = c(739L, 
684L, 732L), HAMILTON = c(3835L, 4095L, 4168L), MUSKOKA = c(305L, 
343L, 334L), NIAGARA = c(2923L, 2970L, 2985L), TORONTO = c(39924L, 
44357L, 46016L), HALTON = c(7863L, 7820L, 7679L), PEEL = c(21415L, 
22340L, 22577L), YORK = c(15714L, 16261L, 16260L), BRANT = c(1947L, 
1861L, 1971L), HALDIMAND.NORFOLK = c(1428L, 1369L, 1447L), HAMILTON = c(7320L, 
7755L, 7883L), MUSKOKA = c(646L, 697L, 717L), NIAGARA = c(5675L, 
5775L, 5868L)), class = "data.frame", row.names = c(NA, -3L))

How to perform Row wise sum based on column condition and add Class Wise specific value as Column?

Try with groupby:

df["Cluster_Sum"] = df.groupby("Cluster")["Value"].transform("sum")

>>> df
    Cluster  Class  Value  Cluster_Sum
0         0     10      1            9
1         0     11      1            9
2         0     14      3            9
3         0     18      1            9
4         0     26      1            9
5         0     29      1            9
6         0     30      1            9
7         1      0      2           12
8         1     19      1           12
9         1     20      1           12
10        1     21      2           12
11        1     36      1           12
12        1     26      1           12
13        1     27      1           12
14        1     37      2           12
15        1     33      1           12

R sum row values based on column name

Here's a for loop approach. I use stringr but we could just as easily use base regex functions to keep it dependency-free.

library(stringr)
name_stems = unique(str_replace(names(data)[-1], "_.*", ""))
result = data[, "date", drop = FALSE]
for(i in seq_along(name_stems)) {
  result[[name_stems[i]]] = 
    rowSums(data[
      str_detect(
        names(data),
        pattern = paste0(name_stems[i], "_")
      )
    ])
}

result
# # A tibble: 3 × 3
#   date        x1    x2
#   <chr>    <dbl> <dbl>
# 1 1/1/2018     9    12
# 2 2/1/2018   399   132
# 3 3/1/2018   999   932

Group by month, sum rows based in column, and keep the other columns

IIUC, you need a single groupby. You need to rework your "revenue" column as numeric.

df['date'] = pd.to_datetime(df['date'], dayfirst=True)

group = df['date'].dt.strftime('%b')

(df.assign(revenue=pd.to_numeric(df['revenue'].str.replace(',', '.')))
   .groupby([group, 'name', 'type'])
   .agg('sum')
   .reset_index()
 )

Output:

   date name      type      size  revenue
0   Apr    A     Basic   6908746      0.1
1   Dec    A  Standard    248753      0.4
2   Dec    B   Premium     82346      0.5
3   Feb    A     Basic   3356943      0.1
4   Feb    D   Premium  12049667      2.1
5   Feb    E  Standard       486      0.9
6   Jan    C     Basic   3589749      0.4
7   Mar    A     Basic     28586      0.2
8   Mar    B     Basic   8734684      0.1
9   Mar    D   Premium       192      0.7
10  Oct    F     Basic     23847      0.3

Note that the above is aggregating months of different years into the same group. If you want to keep years separate, use a period:

group = df['date'].dt.to_period('M')

Output:

       date name      type      size  revenue
0   2021-01    C     Basic   3589749      0.4
1   2021-02    A     Basic   3356943      0.1
2   2021-04    A     Basic   6908746      0.1
3   2021-10    F     Basic     23847      0.3
4   2021-12    A  Standard    248753      0.4
5   2021-12    B   Premium     82346      0.5
6   2022-02    D   Premium  12049667      2.1
7   2022-02    E  Standard       486      0.9
8   2022-03    A     Basic     28586      0.2
9   2022-03    B     Basic   8734684      0.1
10  2022-03    D   Premium       192      0.7

Row-wise sum of paired values in 2 identically labelled lists

Since you say that they should be added "by the shared Time value", we cannot use bind_cols, which would require perfect alignment and shared-existence of all Time values. (If this is always the case, then just ignore it ... and pray that your assumption is always true.) I suggest a merge/join operation.
For discussion on merge/join operations, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?.
Further, joining on a floating-point is subject to a fundamental problem in programming (not just R), testing for perfect equality in floating-point numbers. To work around this, we need to determine a precision level at which two numbers are assumed to be effectively the same. (Note that often R can get floating-point intuitively correct, but there is no clear indication when it does not work.)
For more discussion on issues regarding floating-point equality, test (0.10 + 0.05) == 0.15 (returns false), and then read Why are these numbers not equal?, Is floating point math broken?, and https://en.wikipedia.org/wiki/IEEE_754.
Doing a merge/join based on common columns will be much simpler if we first pivot the data from the current wide format to a long format. (This relies on all of them being the same class, numeric in this case. If there is mixed-class, then this does not work.)
For discussion on pivoting, see Reshaping data.frame from wide to long format (wide-to-long) and the reverse, Reshape multiple value columns to wide format (long to wide).

I'll create a similarly-shaped Sis2, changing the numbers, and changing one of the Time values just a little to show what happens.

set.seed(42)
Sis2 <- Sis1 %>%
  mutate(across(starts_with("Nuc"), ~ runif(n(), 50, 100)))
Sis2$Time[3] <- Sis2$Time[3] + 1e-5
Sis2
#     Time   Nuc_11   Nuc_15   Nuc_16
# 1 835.20 95.74030 86.82942 96.73361
# 2 839.84 96.85377 56.73333 62.77144
# 3 844.48 64.30698 82.84961 73.11464
# 4 849.12 91.52238 85.25324 97.00073
# 5 853.76 82.08728 72.88709 98.91132
# 6 858.40 75.95480 85.95561 55.87437

Despite the fact that they look similar, one of the Time values is in fact different, and will not join properly.

Sis1$Time
# [1] 835.20 839.84 844.48 849.12 853.76 858.40
Sis2$Time
# [1] 835.20 839.84 844.48 849.12 853.76 858.40

Sis1$Time == Sis2$Time
# [1]  TRUE  TRUE FALSE  TRUE  TRUE  TRUE

Assuming that 0.001 is sufficient precision to determine as "effectively identical", here is working code that will work:

library(dplyr)
# library(tidyr) # pivot_*
full_join(
    Sis1 %>%
      mutate(Time_chr = sprintf("%0.03f", Time)) %>%
      tidyr::pivot_longer(-c(Time, Time_chr)),
    Sis2 %>%
      mutate(Time_chr = sprintf("%0.03f", Time)) %>%
      tidyr::pivot_longer(-c(Time, Time_chr)),
    by = c("Time_chr", "name")) %>%
  transmute(
    Time = coalesce(Time.x, Time.y),
    name,
    values = rowSums(cbind(value.x, value.y), na.rm = TRUE)
  ) %>%
  tidyr::pivot_wider(Time, names_from = "name", values_from = "values") %>%
  as.data.frame()
#     Time   Nuc_11   Nuc_15   Nuc_16
# 1 835.20 166.7470 187.9607 186.2512
# 2 839.84 200.3492 203.0143 140.2053
# 3 844.48 207.0985 168.8299 156.4784
# 4 849.12 174.0169 175.3819 166.4995
# 5 853.76 145.7154 150.8000 203.3399
# 6 858.40 141.9953 196.4379 164.0018

(The use of %>% as.data.frame() is purely to show all digits and rows without tibble's convenient brief-view. It is not required. Also ... I'm generally not a fan of nesting %>%-pipes inside function calls like that, but it shows what's being done well, I think; feel free to use temporary variables for pivoted storage if you like.)

Because we're doing a full-join, if there were any mismatches then we would see NA values. Such as this, without the use of converting Time to a string:

full_join(
    tidyr::pivot_longer(Sis1, -Time),
    tidyr::pivot_longer(Sis2, -Time),
    by = c("Time", "name")
  ) %>%
  transmute(
    Time, 
    name, 
    values = rowSums(cbind(value.x, value.y), na.rm = TRUE)
  ) %>%
  tidyr::pivot_wider(Time, names_from = "name", values_from = "values") %>%
  as.data.frame()
#     Time    Nuc_11    Nuc_15    Nuc_16
# 1 835.20 166.74700 187.96075 186.25124
# 2 839.84 200.34921 203.01432 140.20529
# 3 844.48 142.79152  85.98028  83.36374
# 4 849.12 174.01691 175.38195 166.49954
# 5 853.76 145.71543 150.79999 203.33989
# 6 858.40 141.99531 196.43786 164.00176
# 7 844.48  64.30698  82.84961  73.11464

Notice that 844.48 in row 3 and its similar-looking "844.48" in row 7 have not been added, due to the 1e-5 difference I added. You can get R to show them with more precision (see ?options for the 'digits' and 'scipen' arguments).

Row-Wise Sum of Values Grouped by Columns with Same Name