(Speed Challenge) Any Faster Method to Calculate Distance Matrix Between Rows of Two Matrices, in Terms of Euclidean Distance

(Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?

method_XXX <- function() {
sqrt(outer(rowSums(x^2), rowSums(y^2), '+') - tcrossprod(x, 2 * y))
}

Unit: relative
expr min lq mean median uq max
method_ThomasIsCoding_v1() 12.151624 10.486417 9.213107 10.162740 10.235274 5.278517
method_ThomasIsCoding_v2() 6.923647 6.055417 5.549395 6.161603 6.140484 3.438976
method_ThomasIsCoding_v3() 7.133525 6.218283 5.709549 6.438797 6.382204 3.383227
method_AllanCameron() 7.093680 6.071482 5.776172 6.447973 6.497385 3.608604
method_XXX() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

(Speed Challenge) Any faster way to compute distance matrix in terms of generic Hamming distance?

methodM <- function(x) {
xt <- t(x)
sapply(1:nrow(x), function(y) colSums(xt != xt[, y]))
}
microbenchmark::microbenchmark(
methodB(m), methodM(m),
unit = "relative", check = "equivalent", times = 50
)
# Unit: relative
# expr min lq mean median uq max neval cld
# methodB(m) 1.00 1.000000 1.000000 1.000000 1.000000 1.000000 50 a
# methodM(m) 1.25 1.224827 1.359573 1.219507 1.292463 4.550159 50 b

Fast way to compute distance matrix in R for large matrix

Perhaps try the distances package: https://cran.r-project.org/web/packages/distances/distances.pdf

install.packages("distances")
library("distances")
set.seed(123)
M <- matrix(rnorm(39900*1990),nrow = 39900,ncol = 1990)
d <- distances(M)

how to calculate Euclidean distance between two matrices in R

You can use the package pdist:

library(pdist)
dists <- pdist(t(mat1), t(mat2))
as.matrix(dists)
[,1] [,2] [,3]
[1,] 9220.40 9260.735 8866.033
[2,] 12806.35 12820.086 12121.927
[3,] 11630.86 11665.869 11155.823

this will give you all Euclidean distances of the pairs: (mat1$x,mat2$x), (mat1$x,mat2$y),..., (mat1$z,mat2$z)

Compute distance between each combination of rows in two matrices

Yes, there is. You can use pdist2 (see doc):

d = pdist2(A,B);

The entry d(m,n) is the distance between A(m,:) and B(n,:).

How to use apply function to calculate the distance between two matrices

Use two apply instances with the second nested in the first:

d1 <- apply(xtest, 1, function(x) apply(xtrain, 1, function(y) sqrt(crossprod(x-y))))

Check against pdist:

library(pdist)
d2 <- as.matrix(pdist(xtrain, xtest))

all.equal(d1, d2, tolerance = 1e-7)
## [1] TRUE

Euclidean Distances between rows of two data frames in R

Maybe you can try outer + dist like below

outer(
1:nrow(known_data),
1:nrow(unknown_data),
FUN = Vectorize(function(x,y) dist(rbind(known_data[x,],unknown_data[y,])))
)

Calculate Euclidean distances between all rows in matrices A and B

A one-liner without loops, without additional packages, and a little bit faster:

euklDist <- sqrt(apply(array(apply(B,1,function(x){(x-t(A))^2}),c(ncol(A),nrow(A),nrow(B))),2:3,sum))

Speed comparison:

> microbenchmark(jogo  = for (i in 1:nrow(A)) for (j in 1:nrow(B)) d[i,j] <- sqrt(sum((A[i,]-B[j,])^2)),
+ mra68 = sqrt(apply(array(app .... [TRUNCATED]
Unit: seconds
expr min lq mean median uq max neval
jogo 3.601533 4.724619 5.403420 5.549199 6.098734 6.470888 10
mra68 1.334661 1.635258 2.473297 2.542550 3.247981 3.348365 10

Froebenius distance matrix of Matrices

You can do this in one go by adding some dummy dimensions and specifying which axis the summation should be done over.

M = np.linalg.norm(sigma[:,None] - sigma_barre[None,:], axis=(2,3))

Since sigma[:,None] - sigma_barre[None,:] is a KxKxDxD sized matrix, this can take up a lot of memory depending on how big K and D is. If memory is an issue, your solution seems good, although you can loop j starting from i+1 instead, since you know that M[i,j] == M[j,i], and that M[i,i] == 0.

How to separately compute the Euclidean Distance in different dimension?

Assuming you just want the absolute difference between the individual dimensions of the points then pdist is overkill. You can use the following simple function

function d = pdist_1d(S)
idx = nchoosek(1:size(S,1),2);
d = abs(S(idx(:,1),:) - S(idx(:,2),:));
end

which returns the absolute pairwise difference between all pairs of rows in S.

In this case

dist = pdist_1d(S)

gives the same result as

dist = cell2mat(arrayfun(@(dim)pdist(S(:,dim))',1:size(S,2),'UniformOutput',false));


Related Topics



Leave a reply



Submit