Calculate euclidean distance between vectors with cluster medoids
The euclidean distance function is working as expected, as it is calculating the distance between each item in the two arrays. In this regard, the euclidean distance matrix is symmetrical.
import numpy as np
from sklearn_extra.cluster import KMedoids
from sklearn.metrics.pairwise import euclidean_distances
X2=np.array([[ 5.43840675, -1.05259078, -0.21793506, 8.56686818, -2.58056957,
-0.07310339, -0.31181501, 0.02696586],
[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ],
[ 5.93081714, -1.52272427, 0.40706477, 8.56256569, -3.216366 ,
-0.0108426 , -0.57434619, -0.18952662]])
model1 = KMedoids(n_clusters=2, random_state=0).fit(X2)
medoids=np.array([[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ],
[ 5.43840675, -1.05259078, -0.21793506, 8.56686818, -2.58056957,
-0.07310339, -0.31181501, 0.02696586]])
X2[1]=([ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ])
medoids[0]=[ 5.72318296, -0.99665473, -0.14540062, 8.32051008, -3.36201189,
-0.04897565, -0.34271698, -0.0339766 ]
a = (X2[1].reshape(-1, 1))
b = (X2[model1.medoid_indices_][0].reshape(-1, 1))
# dist(x, y) = sqrt(dot(x, x) - 2 * dot(x, y) + dot(y, y))
dist =euclidean_distances(a, b)
print(dist)
This is what you would see:
[[ 0. 6.71983769 5.86858358 2.59732712 9.08519485 5.77215861
6.06589994 5.75715956]
[ 6.71983769 0. 0.85125411 9.31716481 2.36535716 0.94767908
0.65393775 0.96267813]
[ 5.86858358 0.85125411 0. 8.4659107 3.21661127 0.09642497
0.19731636 0.11142402]
[ 2.59732712 9.31716481 8.4659107 0. 11.68252197 8.36948573
8.66322706 8.35448668]
[ 9.08519485 2.36535716 3.21661127 11.68252197 0. 3.31303624
3.01929491 3.32803529]
[ 5.77215861 0.94767908 0.09642497 8.36948573 3.31303624 0.
0.29374133 0.01499905]
[ 6.06589994 0.65393775 0.19731636 8.66322706 3.01929491 0.29374133
0. 0.30874038]
[ 5.75715956 0.96267813 0.11142402 8.35448668 3.32803529 0.01499905
0.30874038 0. ]]
To calculate euclidean distance between vectors in a torch tensor with multiple dimensions
Here ya go
dist = (tensor1 - tensor2).pow(2).sum(3).sqrt()
Basically that's what Euclidean distance is.
Subtract -> power by 2 -> sum along the unfortunate axis you want to eliminate-> square root
R: results differ when calculating Euclidean distance between two vectors with different methods
You need s = x2 - x1
.
norm(s, "2")
#[1] 8.062258
sqrt(sum(s ^ 2)) ## or: sqrt(c(crossprod(s)))
#[1] 8.062258
lpnorm(s, 2)
#[1] 8.062258
If you define s = cbind(x1, x2)
, none of the options you listed is going to compute the Euclidean distance between x1
and x2
, but we can still get them output the same value. In this case they the L2 norm of the vector c(x1, x2)
.
norm(s, "F")
#[1] 6.244998
sqrt(sum(s ^ 2))
#[1] 6.244998
lpnorm(s, 2)
#[1] 6.244998
Finally, norm
is not a common way for computing distance. It is really for matrix norm. When you do norm(cbind(x1, x2), "2")
, it computes the L2 matrix norm which is the largest singular value of matrix cbind(x1, x2)
.
So my problem is with defining
s
. Ok, what if I have more than three vectors?
In that case you want pairwise Euclidean matrix. See function ?dist
.
I have the train sets (containing three or more rows) and one test set (one row). So, I would like to calculate the Euclidean distance or may be other distances. This is the reason why I want to make sure about the distance calculation.
You want the distance between one vector and each of many others, and the result is a vector?
set.seed(0)
X_train <- matrix(runif(10), 5, 2)
x_test <- runif(2)
S <- t(X_train) - x_test
apply(S, 2, norm, "2") ## don't try other types than "2"
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961
apply(S, 2, lpnorm, 2)
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961
sqrt(colSums(S ^ 2)) ## only for L2-norm
#[1] 0.8349220 0.7217628 0.8012416 0.6841445 0.9462961
I would stress again that norm
would fail on a vector, unless type = "2"
. ?norm
clearly says that this function is intended for matrix. What norm
does is very different from your self-defined lpnorm
function. lpnorm
is for a vector norm, norm
is for a matrix norm. Even "L2" means differently for a matrix and a vector.
Euclidean distance between two n-dimenstional vectors
Here is a simple way
n = 10
x = rand(n)
y = rand(n)
d = norm(x-y) # The euclidean (L2) distance
For Manhattan/taxicab/L1 distance, use norm(x-y,1)
How can the Euclidean distance be calculated with NumPy?
Use numpy.linalg.norm
:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord
parameter in numpy.linalg.norm
is 2.
For more theory, see Introduction to Data Mining:
Euclidean distance between the two points using vectorized approach
euclidean_distances
computes the distance for each combination of X,Y points; this will grow large in memory and is totally unnecessary if you just want the distance between each respective row. Sklearn includes a different function called paired_distances
that does what you want:
from sklearn.metrics.pairwise import paired_distances
d = paired_distances(X,Y)
# array([5.83095189, 9.94987437, 7.34846923, 5.47722558, 4. ])
If you need the full pairwise distances, you can get the same result from the diagonal (as pointed out in the comments):
d = euclidean_distances(X,Y).diagonal()
Lastly: arrays are a numpy type, so it is useful to know the numpy api itself (prob. what sklearn calls under the hood). Here are two examples:
d = np.linalg.norm(X-Y, axis=1)
d = np.sqrt(np.sum((X-Y)**2, axis=1))
Euclidean Distance for three (or more) vectors
Yes. If you run dist(rbind(a,b,c))
the results are a table of euclidean distances.
Euclidean Distance between 2 Vectors Implementation
Based on the suggestions of @AlanStokes, the following codes seems to be one solution (I have tested it):
import java.util.Random;
public class EuclideanDist {
public static void main(String[] args) {
EuclideanDist euc = new EuclideanDist();
Random rnd = new Random();
int N = Integer.parseInt(args[0]);
double[] a = new double[N];
double[] b = new double[N];
euc.print(euc.init(a, rnd));
euc.print(euc.init(b, rnd));
System.out.println(euc.distance(a, b));
}
private double[] init(double[] src, Random rnd) {
for (int i = 0; i < src.length; i++) {
src[i] = rnd.nextDouble();
}
return src;
}
private double distance(double[] a, double[] b) {
double diff_square_sum = 0.0;
for (int i = 0; i < a.length; i++) {
diff_square_sum += (a[i] - b[i]) * (a[i] - b[i]);
}
return Math.sqrt(diff_square_sum);
}
private void print(double[] x) {
for (int j = 0; j < x.length; j++) {
System.out.print(" " + x[j] + " ");
}
System.out.println();
}
}
Related Topics
Extract Elements Common in All Column Groups
Add Number of Observations Per Group in Ggplot2 Boxplot
Comparing Two Vectors in an If Statement
Rcpparmadillo Pass User-Defined Function
Returning Anonymous Functions from Lapply - What Is Going Wrong
R Extract Rows Where Column Greater Than 40
How to Change the First Row to Be the Header in R
How to Define the "Mid" Range in Scale_Fill_Gradient2()
Shiny App: Downloadhandler Does Not Produce a File
Why Does "One" < 2 Equal False in R
Subsetting a Data Frame Based on Contents of Another Data Frame
Error in Plot.Window(...):Need Finite 'Xlim' Values