(Speed Challenge) Any Faster Method to Calculate Distance Matrix Between Rows of Two Matrices, in Terms of Euclidean Distance

(Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?

method_XXX <- function() {
  sqrt(outer(rowSums(x^2), rowSums(y^2), '+') - tcrossprod(x, 2 * y))
}

Unit: relative
                       expr       min        lq     mean    median        uq      max
 method_ThomasIsCoding_v1() 12.151624 10.486417 9.213107 10.162740 10.235274 5.278517
 method_ThomasIsCoding_v2()  6.923647  6.055417 5.549395  6.161603  6.140484 3.438976
 method_ThomasIsCoding_v3()  7.133525  6.218283 5.709549  6.438797  6.382204 3.383227
      method_AllanCameron()  7.093680  6.071482 5.776172  6.447973  6.497385 3.608604
               method_XXX()  1.000000  1.000000 1.000000  1.000000  1.000000 1.000000

(Speed Challenge) Any faster way to compute distance matrix in terms of generic Hamming distance?

methodM <- function(x) {
  xt <- t(x)
  sapply(1:nrow(x), function(y) colSums(xt != xt[, y]))
}
microbenchmark::microbenchmark(
  methodB(m), methodM(m),
  unit = "relative", check = "equivalent", times = 50
)
# Unit: relative
#       expr  min       lq     mean   median       uq      max neval cld
# methodB(m) 1.00 1.000000 1.000000 1.000000 1.000000 1.000000    50  a 
# methodM(m) 1.25 1.224827 1.359573 1.219507 1.292463 4.550159    50   b

Fast way to compute distance matrix in R for large matrix

Perhaps try the distances package: https://cran.r-project.org/web/packages/distances/distances.pdf

install.packages("distances")
library("distances")
set.seed(123)
M <- matrix(rnorm(39900*1990),nrow = 39900,ncol = 1990)
d <- distances(M)

how to calculate Euclidean distance between two matrices in R

You can use the package pdist:

library(pdist)
dists <- pdist(t(mat1), t(mat2))
as.matrix(dists)
         [,1]      [,2]      [,3]
[1,]  9220.40  9260.735  8866.033
[2,] 12806.35 12820.086 12121.927
[3,] 11630.86 11665.869 11155.823

this will give you all Euclidean distances of the pairs: (mat1$x,mat2$x), (mat1$x,mat2$y),..., (mat1$z,mat2$z)

Compute distance between each combination of rows in two matrices

Yes, there is. You can use pdist2 (see doc):

d = pdist2(A,B);

The entry d(m,n) is the distance between A(m,:) and B(n,:).

How to use apply function to calculate the distance between two matrices

Use two apply instances with the second nested in the first:

d1 <- apply(xtest, 1, function(x) apply(xtrain, 1, function(y) sqrt(crossprod(x-y))))

Check against pdist:

library(pdist)
d2 <- as.matrix(pdist(xtrain, xtest))

all.equal(d1, d2, tolerance = 1e-7)
## [1] TRUE

Euclidean Distances between rows of two data frames in R

Maybe you can try outer + dist like below

outer(
  1:nrow(known_data),
  1:nrow(unknown_data),
  FUN = Vectorize(function(x,y) dist(rbind(known_data[x,],unknown_data[y,])))
)

Calculate Euclidean distances between all rows in matrices A and B

A one-liner without loops, without additional packages, and a little bit faster:

euklDist <- sqrt(apply(array(apply(B,1,function(x){(x-t(A))^2}),c(ncol(A),nrow(A),nrow(B))),2:3,sum))

Speed comparison:

> microbenchmark(jogo  = for (i in 1:nrow(A)) for (j in 1:nrow(B)) d[i,j] <- sqrt(sum((A[i,]-B[j,])^2)),
+                mra68 = sqrt(apply(array(app .... [TRUNCATED] 
Unit: seconds
  expr      min       lq     mean   median       uq      max neval
  jogo 3.601533 4.724619 5.403420 5.549199 6.098734 6.470888    10
 mra68 1.334661 1.635258 2.473297 2.542550 3.247981 3.348365    10

Froebenius distance matrix of Matrices

You can do this in one go by adding some dummy dimensions and specifying which axis the summation should be done over.

M = np.linalg.norm(sigma[:,None] - sigma_barre[None,:], axis=(2,3))

Since sigma[:,None] - sigma_barre[None,:] is a KxKxDxD sized matrix, this can take up a lot of memory depending on how big K and D is. If memory is an issue, your solution seems good, although you can loop j starting from i+1 instead, since you know that M[i,j] == M[j,i], and that M[i,i] == 0.

How to separately compute the Euclidean Distance in different dimension?

Assuming you just want the absolute difference between the individual dimensions of the points then pdist is overkill. You can use the following simple function

function d = pdist_1d(S)
    idx = nchoosek(1:size(S,1),2);
    d = abs(S(idx(:,1),:) - S(idx(:,2),:));
end

which returns the absolute pairwise difference between all pairs of rows in S.

In this case

dist = pdist_1d(S)

gives the same result as

dist = cell2mat(arrayfun(@(dim)pdist(S(:,dim))',1:size(S,2),'UniformOutput',false));

(Speed Challenge) Any Faster Method to Calculate Distance Matrix Between Rows of Two Matrices, in Terms of Euclidean Distance