(Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?
method_XXX <- function() {
sqrt(outer(rowSums(x^2), rowSums(y^2), '+') - tcrossprod(x, 2 * y))
}
Unit: relative
expr min lq mean median uq max
method_ThomasIsCoding_v1() 12.151624 10.486417 9.213107 10.162740 10.235274 5.278517
method_ThomasIsCoding_v2() 6.923647 6.055417 5.549395 6.161603 6.140484 3.438976
method_ThomasIsCoding_v3() 7.133525 6.218283 5.709549 6.438797 6.382204 3.383227
method_AllanCameron() 7.093680 6.071482 5.776172 6.447973 6.497385 3.608604
method_XXX() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
(Speed Challenge) Any faster way to compute distance matrix in terms of generic Hamming distance?
methodM <- function(x) {
xt <- t(x)
sapply(1:nrow(x), function(y) colSums(xt != xt[, y]))
}
microbenchmark::microbenchmark(
methodB(m), methodM(m),
unit = "relative", check = "equivalent", times = 50
)
# Unit: relative
# expr min lq mean median uq max neval cld
# methodB(m) 1.00 1.000000 1.000000 1.000000 1.000000 1.000000 50 a
# methodM(m) 1.25 1.224827 1.359573 1.219507 1.292463 4.550159 50 b
Fast way to compute distance matrix in R for large matrix
Perhaps try the distances
package: https://cran.r-project.org/web/packages/distances/distances.pdf
install.packages("distances")
library("distances")
set.seed(123)
M <- matrix(rnorm(39900*1990),nrow = 39900,ncol = 1990)
d <- distances(M)
how to calculate Euclidean distance between two matrices in R
You can use the package pdist
:
library(pdist)
dists <- pdist(t(mat1), t(mat2))
as.matrix(dists)
[,1] [,2] [,3]
[1,] 9220.40 9260.735 8866.033
[2,] 12806.35 12820.086 12121.927
[3,] 11630.86 11665.869 11155.823
this will give you all Euclidean distances of the pairs: (mat1$x,mat2$x), (mat1$x,mat2$y),..., (mat1$z,mat2$z)
Compute distance between each combination of rows in two matrices
Yes, there is. You can use pdist2
(see doc):
d = pdist2(A,B);
The entry d(m,n)
is the distance between A(m,:)
and B(n,:)
.
How to use apply function to calculate the distance between two matrices
Use two apply
instances with the second nested in the first:
d1 <- apply(xtest, 1, function(x) apply(xtrain, 1, function(y) sqrt(crossprod(x-y))))
Check against pdist
:
library(pdist)
d2 <- as.matrix(pdist(xtrain, xtest))
all.equal(d1, d2, tolerance = 1e-7)
## [1] TRUE
Euclidean Distances between rows of two data frames in R
Maybe you can try outer
+ dist
like below
outer(
1:nrow(known_data),
1:nrow(unknown_data),
FUN = Vectorize(function(x,y) dist(rbind(known_data[x,],unknown_data[y,])))
)
Calculate Euclidean distances between all rows in matrices A and B
A one-liner without loops, without additional packages, and a little bit faster:
euklDist <- sqrt(apply(array(apply(B,1,function(x){(x-t(A))^2}),c(ncol(A),nrow(A),nrow(B))),2:3,sum))
Speed comparison:
> microbenchmark(jogo = for (i in 1:nrow(A)) for (j in 1:nrow(B)) d[i,j] <- sqrt(sum((A[i,]-B[j,])^2)),
+ mra68 = sqrt(apply(array(app .... [TRUNCATED]
Unit: seconds
expr min lq mean median uq max neval
jogo 3.601533 4.724619 5.403420 5.549199 6.098734 6.470888 10
mra68 1.334661 1.635258 2.473297 2.542550 3.247981 3.348365 10
Froebenius distance matrix of Matrices
You can do this in one go by adding some dummy dimensions and specifying which axis the summation should be done over.
M = np.linalg.norm(sigma[:,None] - sigma_barre[None,:], axis=(2,3))
Since sigma[:,None] - sigma_barre[None,:]
is a KxKxDxD sized matrix, this can take up a lot of memory depending on how big K and D is. If memory is an issue, your solution seems good, although you can loop j
starting from i+1
instead, since you know that M[i,j] == M[j,i]
, and that M[i,i] == 0
.
How to separately compute the Euclidean Distance in different dimension?
Assuming you just want the absolute difference between the individual dimensions of the points then pdist
is overkill. You can use the following simple function
function d = pdist_1d(S)
idx = nchoosek(1:size(S,1),2);
d = abs(S(idx(:,1),:) - S(idx(:,2),:));
end
which returns the absolute pairwise difference between all pairs of rows in S
.
In this case
dist = pdist_1d(S)
gives the same result as
dist = cell2mat(arrayfun(@(dim)pdist(S(:,dim))',1:size(S,2),'UniformOutput',false));
Related Topics
How to Ignore Na in Ifelse Statement
How to Change the Default Directory in Rstudio (Or R)
Knitr: Opts_Chunk$Set() Not Working in Rscript Command
Using Grep to Subset Rows from a Data.Table, Comparing Row Content
R: Calculate Means for Subset of a Group
Tricks to Override Plot.Factor
R Doesn't Recognize Pandoc Linux Mint
Splitting String Between Capital and Lowercase Character in R
Flattening a Delimited Composite Column
How to Sum Data.Frame Column Values
How to Add Expressions to Labels in Facet_Wrap
Generally Disable Dimension Dropping for Matrices
Read CSV with Two Headers into a Data.Frame
Ggplot2: Creating Themed Title, Subtitle with Cowplot
Merge Plm Fitted Values to Dataset