Euclidean Distance of Two Vectors

Calculate euclidean distance between vectors with cluster medoids

The euclidean distance function is working as expected, as it is calculating the distance between each item in the two arrays. In this regard, the euclidean distance matrix is symmetrical.

import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances


X2=np.array([[ 5.43840675, -1.05259078, -0.21793506, 8.56686818, -2.58056957,
-0.07310339, -0.31181501, 0.02696586],
[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ],
[ 5.93081714, -1.52272427, 0.40706477, 8.56256569, -3.216366 ,
-0.0108426 , -0.57434619, -0.18952662]])

model1 = KMedoids(n_clusters=2, random_state=0).fit(X2)

medoids=np.array([[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ],
[ 5.43840675, -1.05259078, -0.21793506, 8.56686818, -2.58056957,
-0.07310339, -0.31181501, 0.02696586]])

X2[1]=([ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ])

medoids[0]=[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ]

a = (X2[1].reshape(-1, 1))
b = (X2[model1.medoid_indices_][0].reshape(-1, 1))

# dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
dist =euclidean_distances(a, b)
print(dist)

This is what you would see:

[[ 0.          6.71983769  5.86858358  2.59732712  9.08519485  5.77215861
6.06589994 5.75715956]
[ 6.71983769 0. 0.85125411 9.31716481 2.36535716 0.94767908
0.65393775 0.96267813]
[ 5.86858358 0.85125411 0. 8.4659107 3.21661127 0.09642497
0.19731636 0.11142402]
[ 2.59732712 9.31716481 8.4659107 0. 11.68252197 8.36948573
8.66322706 8.35448668]
[ 9.08519485 2.36535716 3.21661127 11.68252197 0. 3.31303624
3.01929491 3.32803529]
[ 5.77215861 0.94767908 0.09642497 8.36948573 3.31303624 0.
0.29374133 0.01499905]
[ 6.06589994 0.65393775 0.19731636 8.66322706 3.01929491 0.29374133
0. 0.30874038]
[ 5.75715956 0.96267813 0.11142402 8.35448668 3.32803529 0.01499905
0.30874038 0. ]]

To calculate euclidean distance between vectors in a torch tensor with multiple dimensions

Here ya go

dist = (tensor1 - tensor2).pow(2).sum(3).sqrt()

Basically that's what Euclidean distance is.

Subtract -> power by 2 -> sum along the unfortunate axis you want to eliminate-> square root

R: results differ when calculating Euclidean distance between two vectors with different methods

You need s = x2 - x1.

norm(s, "2")
#[1] 8.062258

sqrt(sum(s ^ 2)) ## or: sqrt(c(crossprod(s)))
#[1] 8.062258

lpnorm(s, 2)
#[1] 8.062258

If you define s = cbind(x1, x2), none of the options you listed is going to compute the Euclidean distance between x1 and x2, but we can still get them output the same value. In this case they the L2 norm of the vector c(x1, x2).

norm(s, "F")
#[1] 6.244998

sqrt(sum(s ^ 2))
#[1] 6.244998

lpnorm(s, 2)
#[1] 6.244998

Finally, norm is not a common way for computing distance. It is really for matrix norm. When you do norm(cbind(x1, x2), "2"), it computes the L2 matrix norm which is the largest singular value of matrix cbind(x1, x2).


So my problem is with defining s. Ok, what if I have more than three vectors?

In that case you want pairwise Euclidean matrix. See function ?dist.

I have the train sets (containing three or more rows) and one test set (one row). So, I would like to calculate the Euclidean distance or may be other distances. This is the reason why I want to make sure about the distance calculation.

You want the distance between one vector and each of many others, and the result is a vector?

set.seed(0)
X_train <- matrix(runif(10), 5, 2)
x_test <- runif(2)
S <- t(X_train) - x_test

apply(S, 2, norm, "2") ## don't try other types than "2"
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

apply(S, 2, lpnorm, 2)
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

sqrt(colSums(S ^ 2)) ## only for L2-norm
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961

I would stress again that norm would fail on a vector, unless type = "2". ?norm clearly says that this function is intended for matrix. What norm does is very different from your self-defined lpnorm function. lpnorm is for a vector norm, norm is for a matrix norm. Even "L2" means differently for a matrix and a vector.

Euclidean distance between two n-dimenstional vectors

Here is a simple way

n = 10
x = rand(n)
y = rand(n)
d = norm(x-y) # The euclidean (L2) distance

For Manhattan/taxicab/L1 distance, use norm(x-y,1)

How can the Euclidean distance be calculated with NumPy?

Use numpy.linalg.norm:

dist = numpy.linalg.norm(a-b)

This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:

Sample Image

Euclidean distance between the two points using vectorized approach

euclidean_distances computes the distance for each combination of X,Y points; this will grow large in memory and is totally unnecessary if you just want the distance between each respective row. Sklearn includes a different function called paired_distances that does what you want:

from sklearn.metrics.pairwise import paired_distances
d = paired_distances(X,Y)
# array([5.83095189, 9.94987437, 7.34846923, 5.47722558, 4. ])

If you need the full pairwise distances, you can get the same result from the diagonal (as pointed out in the comments):

d = euclidean_distances(X,Y).diagonal()

Lastly: arrays are a numpy type, so it is useful to know the numpy api itself (prob. what sklearn calls under the hood). Here are two examples:

d = np.linalg.norm(X-Y, axis=1)
d = np.sqrt(np.sum((X-Y)**2, axis=1))

Euclidean Distance for three (or more) vectors

Yes. If you run dist(rbind(a,b,c)) the results are a table of euclidean distances.

Euclidean Distance between 2 Vectors Implementation

Based on the suggestions of @AlanStokes, the following codes seems to be one solution (I have tested it):

import java.util.Random;

public class EuclideanDist {
public static void main(String[] args) {
EuclideanDist euc = new EuclideanDist();
Random rnd = new Random();

int N = Integer.parseInt(args[0]);

double[] a = new double[N];
double[] b = new double[N];

euc.print(euc.init(a, rnd));
euc.print(euc.init(b, rnd));
System.out.println(euc.distance(a, b));
}

private double[] init(double[] src, Random rnd) {
for (int i = 0; i < src.length; i++) {
src[i] = rnd.nextDouble();
}
return src;
}

private double distance(double[] a, double[] b) {
double diff_square_sum = 0.0;
for (int i = 0; i < a.length; i++) {
diff_square_sum += (a[i] - b[i]) * (a[i] - b[i]);
}
return Math.sqrt(diff_square_sum);
}

private void print(double[] x) {
for (int j = 0; j < x.length; j++) {
System.out.print(" " + x[j] + " ");
}
System.out.println();
}
}


Related Topics



Leave a reply



Submit