Minimum Euclidean distance between points in two different Numpy arrays, not within
(Months later)scipy.spatial.distance.cdist( X, Y )
gives all pairs of distances,
for X and Y 2 dim, 3 dim ...
It also does 22 different norms, detailed
here .
# cdist example: (nx,dim) (ny,dim) -> (nx,ny)
from __future__ import division
import sys
import numpy as np
from scipy.spatial.distance import cdist
#...............................................................................
dim = 10
nx = 1000
ny = 100
metric = "euclidean"
seed = 1
# change these params in sh or ipython: run this.py dim=3 ...
for arg in sys.argv[1:]:
exec( arg )
np.random.seed(seed)
np.set_printoptions( 2, threshold=100, edgeitems=10, suppress=True )
title = "%s dim %d nx %d ny %d metric %s" % (
__file__, dim, nx, ny, metric )
print "\n", title
#...............................................................................
X = np.random.uniform( 0, 1, size=(nx,dim) )
Y = np.random.uniform( 0, 1, size=(ny,dim) )
dist = cdist( X, Y, metric=metric ) # -> (nx, ny) distances
#...............................................................................
print "scipy.spatial.distance.cdist: X %s Y %s -> %s" % (
X.shape, Y.shape, dist.shape )
print "dist average %.3g +- %.2g" % (dist.mean(), dist.std())
print "check: dist[0,3] %.3g == cdist( [X[0]], [Y[3]] ) %.3g" % (
dist[0,3], cdist( [X[0]], [Y[3]] ))
# (trivia: how do pairwise distances between uniform-random points in the unit cube
# depend on the metric ? With the right scaling, not much at all:
# L1 / dim ~ .33 +- .2/sqrt dim
# L2 / sqrt dim ~ .4 +- .2/sqrt dim
# Lmax / 2 ~ .4 +- .2/sqrt dim
How can the Euclidean distance be calculated with NumPy?
Use numpy.linalg.norm
:
dist = numpy.linalg.norm(a-b)
This works because the Euclidean distance is the l2 norm, and the default value of the ord
parameter in numpy.linalg.norm
is 2.
For more theory, see Introduction to Data Mining:
Calculate Euclidean distance between two python arrays
Here, you can just use np.linalg.norm
to compute the Euclidean distance. Your bug is due to np.subtract is expecting the two inputs are of the same length.
import numpy as np
list_a = np.array([[0,1], [2,2], [5,4], [3,6], [4,2]])
list_b = np.array([[0,1],[5,4]])
def run_euc(list_a,list_b):
return np.array([[ np.linalg.norm(i-j) for j in list_b] for i in list_a])
print(run_euc(list_a, list_b))
The code produces:
[[0. 5.83095189]
[2.23606798 3.60555128]
[5.83095189 0. ]
[5.83095189 2.82842712]
[4.12310563 2.23606798]]
Minimum distance between two points of multiple arrays
IIUC, you are trying to find the minimum distance between every particle in Q and every particle in V. On top of that, you want to get those particles as well.
Here is a completely vectorized way of what you are trying to do.
- You can use the same method
np.linalg.norm(particle1-particle2)
- You can create 2 addition axis in the Q and V matrix such that you have a way of broadcasting the operation you want to get a cross product between particles in Q and V.
Q -> 12 , None , 3
V -> None, 12 , 3
---------------------
Q-V -> 12 , 12 , 3
---------------------
norm -> 12 , 12
---------------------
min -> 1
Which results in 12, 12, 3 matrix which can be reduced with
np.linalg.norm
withaxis=-1
to a 12, 12 matrix. This is the distance between every 12 particles in Q vs every 12 ones in V.Now, it's just a matter of finding the minimum in this matrix
To get the index in Q and index in V for particles that result in this minimum distance, you can use
np.unravel_index
to get them and then index Q and V post that.
#Creating 2 dummy lists of particles
Q = np.random.random((12,3))
V = np.random.random((12,3))
#distance between every particle in Q vs V (resuting in 12,12 matrix)
distances = np.linalg.norm(Q[:,None,:]-V[None,:,:], axis=-1)
#Get index position for Q and V particles with min dist
idx = np.unravel_index(distances.argmin(), distances.shape)
#Find particle Q and particle V which result in min distance
particleQ, particleV = Q[idx[0]], V[idx[1]]
print(particleQ)
print(particleV)
[0.60751186 0.93177959 0.23249369]
[0.64406579 0.91601754 0.27724177]
To prove that these particles have the minimum distance between them.
print('Minimum distance between Q and V particles:', distances.min())
print('Distance between the above calculated particleQ & particleV', np.linalg.norm(particleQ-particleV))
Minimum distance between Q and V particles: 0.11540521863305497
Distance between the above calculated particleQ & particleV 0.11540521863305497
As a function
def find_dist(Q, V):
#distance between every particle in Q vs V (resuting in 12,12 matrix)
distances = np.linalg.norm(Q[:,None,:]-V[None,:,:], axis=-1)
idx = np.unravel_index(distances.argmin(), distances.shape)
particleQ, particleV = Q[idx[0]], V[idx[1]]
return distances.min(), (particleQ, particleV), distances
QR_min, QR_particles, _ = find_dist(Q,R)
QV_min, QV_particles, _ = find_dist(Q,V)
RV_min, RV_particles, _ = find_dist(R,V)
print('Arrays Q, R')
print('Min distance:', QR_min)
print('Particle with min distance Q', QR_particles[0])
print('Particle with min distance R', QR_particles[1])
print('')
print('Arrays Q, V')
print('Min distance:', QV_min)
print('Particle with min distance Q', QV_particles[0])
print('Particle with min distance V', QV_particles[1])
print('')
print('Arrays R, V')
print('Min distance:', RV_min)
print('Particle with min distance R', RV_particles[0])
print('Particle with min distance V', RV_particles[1])
print('')
Arrays Q, R
Min distance: 0.0
Particle with min distance Q [ 0.42688496 -1.27452666 0.04265043]
Particle with min distance R [ 0.42688496 -1.27452666 0.04265043]
Arrays Q, V
Min distance: 3.0692806011237583
Particle with min distance Q [ 2.32495428 -0.28709988 0.24674303]
Particle with min distance V [ 2.37346068 -0.16989031 3.31340122]
Arrays R, V
Min distance: 3.3621077448250167
Particle with min distance R [ 0.85044465 -2.26843268 0.09474995]
Particle with min distance V [ 1.5418112 -1.28549041 3.23475079]
array of minimum euclidian distances between all points in array
This solution really focuses on readability over performance - It explicitly calculates and stores the whole n x n
distance matrix and therefore cannot be considered efficient.
But: It is very concise and readable.
import numpy as np
from scipy.spatial.distance import pdist, squareform
#create n x d matrix (n=observations, d=dimensions)
A = np.array([[1,9,2,4], [1,2,3,1]]).T
# explicitly calculate the whole n x n distance matrix
dist_mat = squareform(pdist(A, metric="euclidean"))
# mask the diagonal
np.fill_diagonal(dist_mat, np.nan)
# and calculate the minimum of each row (or column)
np.nanmin(dist_mat, axis=1)
In Numpy, find Euclidean distance between each pair from two arrays
I'm not seeing a built-in, but you could do it yourself pretty easily.
distances = (a-b)**2
distances = distances.sum(axis=-1)
distances = np.sqrt(distances)
Related Topics
Python: Sort Function Breaks in the Presence of Nan
Finding Index of Nearest Point in Numpy Arrays of X and Y Coordinates
How to Set Headers Using Python's Urllib
Differencebetween Exec_Command and Send with Invoke_Shell() on Paramiko
Python: My Function Returns "None" After It Does What I Want It To
Attributeerror: Can't Set Attribute When Connecting to SQLite Database with Flask-Sqlalchemy
Python Script Execute Commands in Terminal
Why Does the 'Is' Operator Behave Differently in a Script VS the Repl
Best Way to Determine If a Sequence Is in Another Sequence
Iterating Through Two Lists in Django Templates
Why Does Map Return a Map Object Instead of a List in Python 3
Python Pack() and Grid() Methods Together
"Inner Exception" (With Traceback) in Python
Reading File Opened with Python Paramiko Sftpclient.Open Method Is Slow
Python Pyqt Signals Are Not Always Working
How to Set Selenium Webdriver from Headless Mode to Normal Mode Within the Same Session