Sum pairwise rows with R?
- Create all the combinations you need with
combn
.t
is used to transpose the matrix as you expect it to be formatted. - Use
apply
to iterate over the indices created in step 1. Note that we use negative indexing so we don't try to sum the Row column. - Bind the two results together.
`
ind <- t(combn(nrow(df1),2))
out <- apply(ind, 1, function(x) sum(df1[x[1], -1] * df1[x[2], -1]))
cbind(ind, out)
out
[1,] 1 2 6.00
[2,] 1 3 7.00
[3,] 1 4 12.65
.....
sum all rows pairwise in two data frames and save to matrix
Make it to a matrix and add up them.
Directly add two data.frame
also works as well.
df1 = data.frame(colA = c(30, 3, 15), colB = c(2, 100, 9))
df2 = data.frame(colA = c(10, 0, 55), colB = c(200, 10, 1))
as.matrix(df1)+ as.matrix(df2)
df1+df2
> as.matrix(df1)+ as.matrix(df2)
colA colB
[1,] 40 202
[2,] 3 110
[3,] 70 10
> df1+df2
colA colB
1 40 202
2 3 110
3 70 10
Sum of all pairwise row products as a two way matrix
If speed is an important factor (e.g. if you're processing a huge matrix), you might find an Rcpp implementation helpful. This only fills the upper triangular portion of the matrix.
library(Rcpp)
cppFunction(
"NumericMatrix josilberRcpp(NumericMatrix x) {
const int nr = x.nrow();
const int nc = x.ncol();
NumericMatrix y(nr, nr);
for (int col=0; col < nc; ++col) {
for (int i=0; i < nr; ++i) {
for (int j=i; j < nr; ++j) {
y(i, j) += x(i, col) * x(j, col);
}
}
}
return y;
}")
josilberRcpp(as.matrix(Dataset))
# [,1] [,2] [,3] [,4]
# [1,] 21 9 24 26
# [2,] 0 11 15 13
# [3,] 0 0 57 28
# [4,] 0 0 0 33
Benchmarking is provided in my other answer. Note that the benchmarking does not include the compile time using cppFunction
, which can be quite significant. Therefore this implementation is probably only useful for very large inputs or when you need to use this function many times.
get pairwise sums of multiple columns in dataframe
You can use rowSums
on a column subset.
As a data frame:
data.frame(ab = rowSums(x[c("a", "b")]), cd = rowSums(x[c("c", "d")]))
# ab cd
# 1 11 17
# 2 10 16
# 3 9 15
# 4 8 14
# 5 7 13
As a matrix:
cbind(ab = rowSums(x[1:2]), cd = rowSums(x[3:4]))
For a wider data frame, you can also use sapply
over a list of column subsets.
sapply(list(1:2, 3:4), function(y) rowSums(x[y]))
For all pairwise column combinations:
y <- combn(ncol(x), 2L, function(y) rowSums(x[y]))
colnames(y) <- combn(names(x), 2L, paste, collapse = "")
y
# ab ac ad bc bd cd
# [1,] 11 13 16 12 15 17
# [2,] 10 13 15 11 13 16
# [3,] 9 13 14 10 11 15
# [4,] 8 13 13 9 9 14
# [5,] 7 13 12 8 7 13
Aggregate sum of column values for all pairwise groupings of other columns in a dataframe in R
You can use combn()
to get the possible combinations of indices and then lapply()
over that.
library(tidyverse)
data |>
seq_along() |>
combn(2, simplify = F) |>
lapply(\(i) aggregate(data$cost~., data[c(i[1], i[2])], sum))
#> [[1]]
#> team height data$cost
#> 1 A short 4
#> 2 B short 5
#> 3 A tall 9
#> 4 B tall 5
#> 5 C tall 4
#>
#> [[2]]
#> team size data$cost
#> 1 A big 9
#> 2 C big 4
#> 3 A small 4
#> 4 B small 10
#>
#> [[3]]
#> team data$cost
#> 1 A 13
#> 2 B 10
#> 3 C 4
#>
#> [[4]]
#> height size data$cost
#> 1 short big 4
#> 2 tall big 9
#> 3 short small 5
#> 4 tall small 9
#>
#> [[5]]
#> height data$cost
#> 1 short 9
#> 2 tall 18
#>
#> [[6]]
#> size data$cost
#> 1 big 13
#> 2 small 14
Created on 2022-03-30 by the reprex package (v2.0.1)
R: How to sum pairs in a Matrix by row?
testM[,c(T,F)]+testM[,c(F,T)];
## [,1] [,2]
## [1,] 12 52
## [2,] 14 54
## [3,] 16 56
## [4,] 18 58
## [5,] 20 60
## [6,] 22 62
## [7,] 24 64
## [8,] 26 66
## [9,] 28 68
## [10,] 30 70
Create combinations by group and sum
You can create pairwise indices using combn()
and expand the data frame with these using slice()
. Then just group by these row pairs and summarise. I'm assuming you want pairwise combinations but this can be adapted for larger sets if needed. Some code to handle groups < 2 is included but can be removed if these don't exist in your data.
library(dplyr)
library(purrr)
df1 %>%
group_by(id) %>%
slice(c(combn(seq(n()), min(n(), 2)))) %>%
mutate(id2 = (row_number()-1) %/% 2) %>%
group_by(id, id2) %>%
summarise(name = toString(name),
across(where(is.numeric), sum), .groups = "drop") %>%
select(-id2) %>%
bind_rows(df1 %>%
group_by(id) %>%
filter(n() > 1), .) %>%
arrange(id) %>%
ungroup()
# A tibble: 6 × 4
id name number value
<chr> <chr> <int> <int>
1 a bob 1 1
2 a jane 2 2
3 a bob, jane 3 3
4 b mark 1 1
5 b brittney 2 2
6 b mark, brittney 3 3
Edit:
To adapt for all possible combinations you can iterate over the values up to the max group size. Using edited data which has a couple of rows added to the first group:
map_df(seq(max(table(df2$id))), ~
df2 %>%
group_by(id) %>%
slice(c(combn(seq(n()), .x * (.x <= n())))) %>%
mutate(id2 = (row_number() - 1) %/% .x) %>%
group_by(id, id2) %>%
summarise(name = toString(name),
across(where(is.numeric), sum), .groups = "drop")
) %>%
select(-id2) %>%
arrange(id)
# A tibble: 18 × 4
id name number value
<chr> <chr> <int> <int>
1 a bob 1 1
2 a jane 2 2
3 a sophie 1 1
4 a jeremy 2 2
5 a bob, jane 3 3
6 a bob, sophie 2 2
7 a bob, jeremy 3 3
8 a jane, sophie 3 3
9 a jane, jeremy 4 4
10 a sophie, jeremy 3 3
11 a bob, jane, sophie 4 4
12 a bob, jane, jeremy 5 5
13 a bob, sophie, jeremy 4 4
14 a jane, sophie, jeremy 5 5
15 a bob, jane, sophie, jeremy 6 6
16 b mark 3 5
17 b brittney 4 6
18 b mark, brittney 7 11
Data for df2
:
df2 <- structure(list(id = c("a", "a", "a", "a", "b", "b"), name = c("bob",
"jane", "sophie", "jeremy", "mark", "brittney"), number = c(1L,
2L, 1L, 2L, 3L, 4L), value = c(1L, 2L, 1L, 2L, 5L, 6L)), class = "data.frame", row.names = c(NA,
-6L))
How to calculate all pairwise abs differences among many variables in R
What probably irritated you is that outer
did not work when you delete the sum
(I'm sure you tried that). That's because the Vectorize
result can not be simplified into a matrix (the default), so we may set it to FALSE
r <- outer(seq_along(df), seq_along(df),
FUN=Vectorize(function(i, j) abs(df[[i]] - df[[j]]), SIMPLIFY=FALSE))
Result
matrix(unlist(r), nrow(df))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
# [1,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [2,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [3,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [4,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [5,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [6,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
Related Topics
Additional Metrics in Caret - Ppv, Sensitivity, Specificity
Rscript Could Not Find Function
Adding Labels on Curves in Glmnet Plot in R
Rolling Join Grouped by a Second Variable in Data.Table
How to Calculate Confidence Intervals for Nonlinear Least Squares in R
Ggplot Legend - Scale_Colour_Manual Not Working
Reading and Scanning Ms Word .Doc Files in R
Developing Shiny App as a Package and Deploying It to Shiny Server
R Define Dimensions of Empty Data Frame
1-Dimensional Matrix Is Changed to a Vector in R
Documentation for Special Variables in Ggplot (..Count.., ..Density.., etc.)
R: Adding a "Tool Tip" to Interactive Plot (Plotly)
Large Integers in Data.Table. Grouping Results Different in 1.9.2 Compared to 1.8.10