Compute All Pairwise Differences Within a Vector in R

compute all pairwise differences within a vector in R

as.numeric(dist(v))

seems to work; it treats v as a column matrix and computes the Euclidean distance between rows, which in this case is sqrt((x-y)^2)=abs(x-y)

If we're golfing, then I'll offer c(dist(v)), which is equivalent and which I'm guessing will be unbeatable.

@AndreyShabalin makes the good point that using method="manhattan" will probably be slightly more efficient since it avoids the squaring/square-rooting stuff.

R function for doing all pairwise comparisons for two vectors

outer is probably the function you want. However, it returns a matrix, so we need to get a vector. Here's one way of many:

 a <- 1:3
b <- 2:4
as.vector(outer(a,b,">"))
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE

(that's not the order you specified though; it is, however, a consistent order)

Also:

 as.vector(t(outer(a,b,">")))
[1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE

Now for differences:

> as.vector(outer(a,b,"-"))
[1] -1 0 1 -2 -1 0 -3 -2 -1

I find that outer is very useful. I use it regularly.

How to calculate all pairwise abs differences among many variables in R

What probably irritated you is that outer did not work when you delete the sum (I'm sure you tried that). That's because the Vectorize result can not be simplified into a matrix (the default), so we may set it to FALSE

r <- outer(seq_along(df), seq_along(df),
FUN=Vectorize(function(i, j) abs(df[[i]] - df[[j]]), SIMPLIFY=FALSE))

Result

matrix(unlist(r), nrow(df))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33] [,34] [,35] [,36]
# [1,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [2,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [3,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [4,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [5,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0
# [6,] 0 6 12 18 24 30 6 0 6 12 18 24 12 6 0 6 12 18 18 12 6 0 6 12 24 18 12 6 0 6 30 24 18 12 6 0

R: mean pairwise differences in string vectors

Your question is not terribly clear but you appear to want a Levenshtein distance:

x = c("0010100101",
"1001011101",
"1111111010")

#switch off deletions and insertions:
d <- adist(x, costs = list(ins=Inf, del=Inf, sub=1))
# [,1] [,2] [,3]
#[1,] 0 6 8
#[2,] 6 0 6
#[3,] 8 6 0

mean(d[upper.tri(d)])
#[1] 6.666667

Simple Pairwise Difference of vector

Here is one method using combn

# convert to a vector
vNew <- as.numeric(v[1,])
# calculate pair-wise differences
t(rbind(combn(vNew,2), combn(vNew, 2, FUN=dist)))

[,1] [,2] [,3]
[1,] 1 2 1
[2,] 1 3 2
[3,] 1 4 3
[4,] 2 3 1
[5,] 2 4 2
[6,] 3 4 1

Here, the first two columns are the elements of the vector and the third column is the distance.

How to calculate all pairwise difference for multiple varibles

We may use outer if we need a matrix

outer(seq_along(df1), seq_along(df1), FUN =
Vectorize(function(i, j) sum(df1[[i]] - df1[[j]], na.rm = TRUE)))

-output

 [,1]  [,2]  [,3]
[1,] 0.00 47.80 56.49
[2,] -47.80 0.00 8.69
[3,] -56.49 -8.69 0.00

Or if we don't need redundant comparison, use combn

combn(df1, 2, FUN = function(x) sum(x[[1]] - x[[2]], na.rm = TRUE))

-output

[1] 47.80 56.49  8.69

data

df1 <- structure(list(V1 = c(67.81, 65.33, 54.67, 53.2, 53.77, 52.66, 
50.77, 47.84, 46.33, 44.15), V2 = c(57.68, 56.58, 52.61, 49.74,
49.28, 48.03, 46.15, 43.96, 42.76, 41.94), V3 = c(54.04, 54.34,
52.36, 49.34, 48.93, 48.06, 46.21, 43.51, 42.15, 41.1)),
class = "data.frame", row.names = c(NA,
-10L))


Related Topics



Leave a reply



Submit