compute all pairwise differences within a vector in R
as.numeric(dist(v))
seems to work; it treats v
as a column matrix and computes the Euclidean distance between rows, which in this case is sqrt((x-y)^2)=abs(x-y)
If we're golfing, then I'll offer c(dist(v))
, which is equivalent and which I'm guessing will be unbeatable.
@AndreyShabalin makes the good point that using method="manhattan"
will probably be slightly more efficient since it avoids the squaring/square-rooting stuff.
R function for doing all pairwise comparisons for two vectors
outer
is probably the function you want. However, it returns a matrix, so we need to get a vector. Here's one way of many:
a <- 1:3
b <- 2:4
as.vector(outer(a,b,">"))
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
(that's not the order you specified though; it is, however, a consistent order)
Also:
as.vector(t(outer(a,b,">")))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
Now for differences:
> as.vector(outer(a,b,"-"))
[1] -1 0 1 -2 -1 0 -3 -2 -1
I find that outer
is very useful. I use it regularly.
How to calculate all pairwise abs differences among many variables in R
What probably irritated you is that outer
did not work when you delete the sum
(I'm sure you tried that). That's because the Vectorize
result can not be simplified into a matrix (the default), so we may set it to FALSE
r <- outer(seq_along(df), seq_along(df),
FUN=Vectorize(function(i, j) abs(df[[i]] - df[[j]]), SIMPLIFY=FALSE))
Result
matrix(unlist(r), nrow(df))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
# [1,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [2,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [3,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [4,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [5,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [6,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
R: mean pairwise differences in string vectors
Your question is not terribly clear but you appear to want a Levenshtein distance:
x = c("0010100101",
"1001011101",
"1111111010")
#switch off deletions and insertions:
d <- adist(x, costs = list(ins=Inf, del=Inf, sub=1))
# [,1] [,2] [,3]
#[1,] 0 6 8
#[2,] 6 0 6
#[3,] 8 6 0
mean(d[upper.tri(d)])
#[1] 6.666667
Simple Pairwise Difference of vector
Here is one method using combn
# convert to a vector
vNew <- as.numeric(v[1,])
# calculate pair-wise differences
t(rbind(combn(vNew,2), combn(vNew, 2, FUN=dist)))
[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 3 2
[3,] 1 4 3
[4,] 2 3 1
[5,] 2 4 2
[6,] 3 4 1
Here, the first two columns are the elements of the vector and the third column is the distance.
How to calculate all pairwise difference for multiple varibles
We may use outer
if we need a matrix
outer(seq_along(df1), seq_along(df1), FUN =
Vectorize(function(i, j) sum(df1[[i]] - df1[[j]], na.rm = TRUE)))
-output
[,1] [,2] [,3]
[1,] 0.00 47.80 56.49
[2,] -47.80 0.00 8.69
[3,] -56.49 -8.69 0.00
Or if we don't need redundant comparison, use combn
combn(df1, 2, FUN = function(x) sum(x[[1]] - x[[2]], na.rm = TRUE))
-output
[1] 47.80 56.49 8.69
data
df1 <- structure(list(V1 = c(67.81, 65.33, 54.67, 53.2, 53.77, 52.66,
50.77, 47.84, 46.33, 44.15), V2 = c(57.68, 56.58, 52.61, 49.74,
49.28, 48.03, 46.15, 43.96, 42.76, 41.94), V3 = c(54.04, 54.34,
52.36, 49.34, 48.93, 48.06, 46.21, 43.51, 42.15, 41.1)),
class = "data.frame", row.names = c(NA,
-10L))
Related Topics
How to Use the Spread Function Properly in Tidyr
Open Hyperlink on Click on an Ggplot/Plotly Chart
Plot Linear Regressions Lines Without Interaction in Ggplot2
Remove Words in One Column Present in Another Column in R
Difference Between 'Paste', 'Str_C', 'Str_Join', 'Stri_Join', 'Stri_C', 'Stri_Paste'
How to List All the Functions Signatures in an R File
R - Compute Cross Product of Vectors (Physics)
R:Loops to Process Large Dataset(Gbs) in Chunks
To Display Two Heatmaps in Same PDF Side by Side in R
R: Selecting First of N Consecutive Rows Above a Certain Threshold Value
Keep First Row by Multiple Columns in an R Data.Table
Global Variable in a Package - Which Approach Is More Recommended
Running Out of Heap Space in Sparklyr, But Have Plenty of Memory
Generate Rows Between Two Dates into a Data Frame in R
Consistent Factor Levels for Same Value Over Different Datasets
Contingency Table Based on Third Variable (Numeric)
Binning Data, Finding Results by Group, and Plotting Using R