Sum Pairwise Rows with R

Sum pairwise rows with R?

  1. Create all the combinations you need with combn. t is used to transpose the matrix as you expect it to be formatted.
  2. Use apply to iterate over the indices created in step 1. Note that we use negative indexing so we don't try to sum the Row column.
  3. Bind the two results together.

`

ind <- t(combn(nrow(df1),2))
out <- apply(ind, 1, function(x) sum(df1[x[1], -1] * df1[x[2], -1]))
cbind(ind, out)

out
[1,] 1 2 6.00
[2,] 1 3 7.00
[3,] 1 4 12.65
.....

sum all rows pairwise in two data frames and save to matrix

Make it to a matrix and add up them.

Directly add two data.frame also works as well.

df1 = data.frame(colA = c(30, 3, 15), colB = c(2, 100, 9))
df2 = data.frame(colA = c(10, 0, 55), colB = c(200, 10, 1))
as.matrix(df1)+ as.matrix(df2)
df1+df2

> as.matrix(df1)+ as.matrix(df2)
colA colB
[1,] 40 202
[2,] 3 110
[3,] 70 10

> df1+df2
colA colB
1 40 202
2 3 110
3 70 10

Sum of all pairwise row products as a two way matrix

If speed is an important factor (e.g. if you're processing a huge matrix), you might find an Rcpp implementation helpful. This only fills the upper triangular portion of the matrix.

library(Rcpp)
cppFunction(
"NumericMatrix josilberRcpp(NumericMatrix x) {
const int nr = x.nrow();
const int nc = x.ncol();
NumericMatrix y(nr, nr);
for (int col=0; col < nc; ++col) {
for (int i=0; i < nr; ++i) {
for (int j=i; j < nr; ++j) {
y(i, j) += x(i, col) * x(j, col);
}
}
}
return y;
}")
josilberRcpp(as.matrix(Dataset))
# [,1] [,2] [,3] [,4]
# [1,] 21 9 24 26
# [2,] 0 11 15 13
# [3,] 0 0 57 28
# [4,] 0 0 0 33

Benchmarking is provided in my other answer. Note that the benchmarking does not include the compile time using cppFunction, which can be quite significant. Therefore this implementation is probably only useful for very large inputs or when you need to use this function many times.

get pairwise sums of multiple columns in dataframe

You can use rowSums on a column subset.

As a data frame:

data.frame(ab = rowSums(x[c("a", "b")]), cd = rowSums(x[c("c", "d")]))
# ab cd
# 1 11 17
# 2 10 16
# 3 9 15
# 4 8 14
# 5 7 13

As a matrix:

cbind(ab = rowSums(x[1:2]), cd = rowSums(x[3:4]))

For a wider data frame, you can also use sapply over a list of column subsets.

sapply(list(1:2, 3:4), function(y) rowSums(x[y]))

For all pairwise column combinations:

y <- combn(ncol(x), 2L, function(y) rowSums(x[y]))
colnames(y) <- combn(names(x), 2L, paste, collapse = "")
y
# ab ac ad bc bd cd
# [1,] 11 13 16 12 15 17
# [2,] 10 13 15 11 13 16
# [3,] 9 13 14 10 11 15
# [4,] 8 13 13 9 9 14
# [5,] 7 13 12 8 7 13

Aggregate sum of column values for all pairwise groupings of other columns in a dataframe in R

You can use combn() to get the possible combinations of indices and then lapply() over that.

library(tidyverse)

data |>
seq_along() |>
combn(2, simplify = F) |>
lapply(\(i) aggregate(data$cost~., data[c(i[1], i[2])], sum))
#> [[1]]
#> team height data$cost
#> 1 A short 4
#> 2 B short 5
#> 3 A tall 9
#> 4 B tall 5
#> 5 C tall 4
#>
#> [[2]]
#> team size data$cost
#> 1 A big 9
#> 2 C big 4
#> 3 A small 4
#> 4 B small 10
#>
#> [[3]]
#> team data$cost
#> 1 A 13
#> 2 B 10
#> 3 C 4
#>
#> [[4]]
#> height size data$cost
#> 1 short big 4
#> 2 tall big 9
#> 3 short small 5
#> 4 tall small 9
#>
#> [[5]]
#> height data$cost
#> 1 short 9
#> 2 tall 18
#>
#> [[6]]
#> size data$cost
#> 1 big 13
#> 2 small 14

Created on 2022-03-30 by the reprex package (v2.0.1)

R: How to sum pairs in a Matrix by row?

testM[,c(T,F)]+testM[,c(F,T)];
## [,1] [,2]
## [1,] 12 52
## [2,] 14 54
## [3,] 16 56
## [4,] 18 58
## [5,] 20 60
## [6,] 22 62
## [7,] 24 64
## [8,] 26 66
## [9,] 28 68
## [10,] 30 70

Create combinations by group and sum

You can create pairwise indices using combn() and expand the data frame with these using slice(). Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.

library(dplyr)
library(purrr)

df1 %>%
group_by(id) %>%
slice(c(combn(seq(n()), min(n(), 2)))) %>%
mutate(id2 = (row_number()-1) %/% 2) %>%
group_by(id, id2) %>%
summarise(name = toString(name),
across(where(is.numeric), sum), .groups = "drop") %>%
select(-id2) %>%
bind_rows(df1 %>%
group_by(id) %>%
filter(n() > 1), .) %>%
arrange(id) %>%
ungroup()

# A tibble: 6 × 4
id name number value
<chr> <chr> <int> <int>
1 a bob 1 1
2 a jane 2 2
3 a bob, jane 3 3
4 b mark 1 1
5 b brittney 2 2
6 b mark, brittney 3 3

Edit:

To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:

map_df(seq(max(table(df2$id))), ~
df2 %>%
group_by(id) %>%
slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
mutate(id2 = (row_number() - 1) %/% .x) %>%
group_by(id, id2) %>%
summarise(name = toString(name),
across(where(is.numeric), sum), .groups = "drop")
) %>%
select(-id2) %>%
arrange(id)

# A tibble: 18 × 4
id name number value
<chr> <chr> <int> <int>
1 a bob 1 1
2 a jane 2 2
3 a sophie 1 1
4 a jeremy 2 2
5 a bob, jane 3 3
6 a bob, sophie 2 2
7 a bob, jeremy 3 3
8 a jane, sophie 3 3
9 a jane, jeremy 4 4
10 a sophie, jeremy 3 3
11 a bob, jane, sophie 4 4
12 a bob, jane, jeremy 5 5
13 a bob, sophie, jeremy 4 4
14 a jane, sophie, jeremy 5 5
15 a bob, jane, sophie, jeremy 6 6
16 b mark 3 5
17 b brittney 4 6
18 b mark, brittney 7 11

Data for df2:

df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob", 
"jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L,
2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA,
-6L))

How to calculate all pairwise abs differences among many variables in R

What probably irritated you is that outer did not work when you delete the sum (I'm sure you tried that). That's because the Vectorize result can not be simplified into a matrix (the default), so we may set it to FALSE

r <- outer(seq_along(df), seq_along(df),
FUN=Vectorize(function(i, j) abs(df[[i]] - df[[j]]), SIMPLIFY=FALSE))

Result

matrix(unlist(r), nrow(df))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
# [1,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [2,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [3,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [4,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [5,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [6,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0


Related Topics



Leave a reply



Submit