Minimum Euclidean Distance Between Points in Two Different Numpy Arrays, Not Within

Minimum Euclidean distance between points in two different Numpy arrays, not within

(Months later)
scipy.spatial.distance.cdist( X, Y )
gives all pairs of distances,
for X and Y 2 dim, 3 dim ...

It also does 22 different norms, detailed
here .

# cdist example: (nx,dim) (ny,dim) -> (nx,ny)

from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist

#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1

# change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )

title = "%s dim %d nx %d ny %d metric %s" % (
__file__, dim, nx, ny, metric )
print "\n", title

#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric ) # -> (nx, ny) distances
#...............................................................................

print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
dist[0,3], cdist( [X[0]], [Y[3]] ))

# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2 ~ .4 +- .2/sqrt dim

How can the Euclidean distance be calculated with NumPy?

Use numpy.linalg.norm:

dist = numpy.linalg.norm(a-b)

This works because the Euclidean distance is the l2 norm, and the default value of the ord parameter in numpy.linalg.norm is 2.
For more theory, see Introduction to Data Mining:

Sample Image

Calculate Euclidean distance between two python arrays

Here, you can just use np.linalg.norm to compute the Euclidean distance. Your bug is due to np.subtract is expecting the two inputs are of the same length.

import numpy as np

list_a = np.array([[0,1], [2,2], [5,4], [3,6], [4,2]])
list_b = np.array([[0,1],[5,4]])

def run_euc(list_a,list_b):
return np.array([[ np.linalg.norm(i-j) for j in list_b] for i in list_a])

print(run_euc(list_a, list_b))

The code produces:

[[0.         5.83095189]
[2.23606798 3.60555128]
[5.83095189 0. ]
[5.83095189 2.82842712]
[4.12310563 2.23606798]]

Minimum distance between two points of multiple arrays

IIUC, you are trying to find the minimum distance between every particle in Q and every particle in V. On top of that, you want to get those particles as well.

Here is a completely vectorized way of what you are trying to do.

  1. You can use the same method np.linalg.norm(particle1-particle2)
  2. You can create 2 addition axis in the Q and V matrix such that you have a way of broadcasting the operation you want to get a cross product between particles in Q and V.
Q    -> 12  , None , 3
V -> None, 12 , 3
---------------------
Q-V -> 12 , 12 , 3
---------------------
norm -> 12 , 12
---------------------
min -> 1


  1. Which results in 12, 12, 3 matrix which can be reduced with np.linalg.norm with axis=-1 to a 12, 12 matrix. This is the distance between every 12 particles in Q vs every 12 ones in V.

  2. Now, it's just a matter of finding the minimum in this matrix

  3. To get the index in Q and index in V for particles that result in this minimum distance, you can use np.unravel_index to get them and then index Q and V post that.

#Creating 2 dummy lists of particles
Q = np.random.random((12,3))
V = np.random.random((12,3))

#distance between every particle in Q vs V (resuting in 12,12 matrix)
distances = np.linalg.norm(Q[:,None,:]-V[None,:,:], axis=-1)

#Get index position for Q and V particles with min dist
idx = np.unravel_index(distances.argmin(), distances.shape)

#Find particle Q and particle V which result in min distance
particleQ, particleV = Q[idx[0]], V[idx[1]]

print(particleQ)
print(particleV)
[0.60751186 0.93177959 0.23249369]
[0.64406579 0.91601754 0.27724177]

To prove that these particles have the minimum distance between them.

print('Minimum distance between Q and V particles:', distances.min())
print('Distance between the above calculated particleQ & particleV', np.linalg.norm(particleQ-particleV))
Minimum distance between Q and V particles: 0.11540521863305497
Distance between the above calculated particleQ & particleV 0.11540521863305497


As a function

def find_dist(Q, V):
#distance between every particle in Q vs V (resuting in 12,12 matrix)
distances = np.linalg.norm(Q[:,None,:]-V[None,:,:], axis=-1)
idx = np.unravel_index(distances.argmin(), distances.shape)
particleQ, particleV = Q[idx[0]], V[idx[1]]

return distances.min(), (particleQ, particleV), distances

QR_min, QR_particles, _ = find_dist(Q,R)
QV_min, QV_particles, _ = find_dist(Q,V)
RV_min, RV_particles, _ = find_dist(R,V)

print('Arrays Q, R')
print('Min distance:', QR_min)
print('Particle with min distance Q', QR_particles[0])
print('Particle with min distance R', QR_particles[1])
print('')
print('Arrays Q, V')
print('Min distance:', QV_min)
print('Particle with min distance Q', QV_particles[0])
print('Particle with min distance V', QV_particles[1])
print('')
print('Arrays R, V')
print('Min distance:', RV_min)
print('Particle with min distance R', RV_particles[0])
print('Particle with min distance V', RV_particles[1])
print('')
Arrays Q, R
Min distance: 0.0
Particle with min distance Q [ 0.42688496 -1.27452666 0.04265043]
Particle with min distance R [ 0.42688496 -1.27452666 0.04265043]

Arrays Q, V
Min distance: 3.0692806011237583
Particle with min distance Q [ 2.32495428 -0.28709988 0.24674303]
Particle with min distance V [ 2.37346068 -0.16989031 3.31340122]

Arrays R, V
Min distance: 3.3621077448250167
Particle with min distance R [ 0.85044465 -2.26843268 0.09474995]
Particle with min distance V [ 1.5418112 -1.28549041 3.23475079]

array of minimum euclidian distances between all points in array

This solution really focuses on readability over performance - It explicitly calculates and stores the whole n x n distance matrix and therefore cannot be considered efficient.

But: It is very concise and readable.

import numpy as np
from scipy.spatial.distance import pdist, squareform

#create n x d matrix (n=observations, d=dimensions)
A = np.array([[1,9,2,4], [1,2,3,1]]).T

# explicitly calculate the whole n x n distance matrix
dist_mat = squareform(pdist(A, metric="euclidean"))

# mask the diagonal
np.fill_diagonal(dist_mat, np.nan)

# and calculate the minimum of each row (or column)
np.nanmin(dist_mat, axis=1)

In Numpy, find Euclidean distance between each pair from two arrays

I'm not seeing a built-in, but you could do it yourself pretty easily.

distances = (a-b)**2
distances = distances.sum(axis=-1)
distances = np.sqrt(distances)


Related Topics



Leave a reply



Submit