Find K Nearest Neighbors, Starting from a Distance Matrix

Implementing k nearest neighbours from distance matrix?

The way I see it, I simply find n + 1 smallest numbers/distances/neighbours for each row and remove the 0, which would then give you n numbers/distances/neighbours. Keep in mind that the code will not work if you have a distance of zeroes! Only the diagonals are allowed to be 0.

import pandas as pd
import numpy as np



X = pd.DataFrame([[0, 1, 3, 2],[5, 0, 2, 2],[3, 2, 0, 1],[2, 3, 4, 0]])

X.columns = ['A', 'B', 'C', 'D']
X.index = ['A', 'B', 'C', 'D']

X = X.T

for i in X.index:

Y = X.nsmallest(3, i)
Y = Y.T
Y = Y[Y.index.str.startswith(i)]
Y = Y.loc[:, Y.any()]

for j in Y.index:
print(i + ": ", list(Y.columns))

This prints out:

A:  ['B', 'D']
B: ['C', 'D']
C: ['D', 'B']
D: ['A', 'B']

Python skicit-learn k-nearest neighbors - 3D distance matrix

You can use a pre-computed distance matrix as your input to sklearn's neighbours.NearestNeighbors by setting the metrics parameter as "precomputed"

Lets create a dummy distance matrix between 6 points in some 3D space (or any dimensional space).

from sklearn.neighbors import NearestNeighbors

#Distance matrix from numpy (dummy)
precomputed_distances = np.random.random((6,6))

#Get top 5 neighbours from precomputed distance matrix
nn = NearestNeighbors(n_neighbors=5, metric='precomputed')
nn.fit(precomputed_distances)

#Fetch kneighbors
distances, indexes = nn.kneighbors()

print(indexes)
print('')
print(distances)
#neighbours indexes
[[2 5 3 1 4]
[0 4 3 2 5]
[5 3 0 1 4]
[1 2 4 0 5]
[3 1 2 5 0]
[3 2 0 1 4]]

#distances
[[0.07355072 0.30327092 0.32645641 0.54227088 0.76145093]
[0.06451358 0.13867276 0.7570105 0.84383876 0.92184049]
[0.52953184 0.59474913 0.63211483 0.80958676 0.99361867]
[0.10885239 0.31822021 0.39327313 0.47670755 0.6764581 ]
[0.18309627 0.69483384 0.74029263 0.82705113 0.92923248]
[0.28584336 0.42956108 0.43323451 0.64124948 0.90154176]]

Read more about this here.

kNN - How to locate the nearest neighbors in the training matrix based on the calculated distances

I will suggest to use the python library sklearn that has a KNeighborsClassifier from which, once fitted, you can retrieve the nearest neighbors you are looking for :

Try this out:

# Import
from sklearn.neighbors import KNeighborsClassifier

# Instanciate your classifier
neigh = KNeighborsClassifier(n_neighbors=4) #k=4 or whatever you want
# Fit your classifier
neigh.fit(X, y) # Where X is your training set and y is the training_output
# Get the neighbors
neigh.kneighbors(X_test, return_distance=False) # Where X_test is the sample or array of samples from which you want to get the k-nearest neighbors

Nearest Neighbors in Python given the distance matrix

You'll want to create a DistanceMetric object, supplying your own function as an argument:

metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)

From the docs:

Here func is a function which takes two one-dimensional numpy arrays,
and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties

  • Non-negativity: d(x, y) >= 0
  • Identity: d(x, y) = 0 if and only if x == y
  • Symmetry: d(x, y) = d(y, x)
  • Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)

You can then create your classifier with metric=metric as a keyword argument and it will use this when calculating distances.

5 nearest neighbors based on given distance in r

data.table solution:

library(data.table)
data<-fread("id x y age
1 1745353 930284.1 30
2 1745317 930343.4 23
3 1745201 930433.9 10
4 1745351 930309.4 5
5 1745342 930335.2 2
6 1746619 929969.7 66
7 1746465 929827.1 7
8 1746731 928779.5 55
9 1746629 929902.6 26
10 1745938 928923.2 22")

data[,all_x:=list(list(x))]
data[,all_y:=list(list(y))]
data[,all_age:=list(list(age))]
data[,seq_nr:=seq_len(.N)]

#Distance formula:
formula_distance<-function(x_1,x_2,y_1,y_2,z){
x_2<-x_2[[1]][-z]
y_2<-y_2[[1]][-z]
sqrt((x_1-x_2)^2+(y_1-y_2)^2)
}

data<-data[,{list(dist = formula_distance(x,all_x,y,all_y,seq_nr),
id =seq(1:nrow(data))[-id],
age_id=all_age[[1]][-id],
age=rep(age,nrow(data)-1))},by=1:nrow(data)]
data<-data[order(nrow,dist)]
#Filter data within threshold:
threshold<-1000

#How many nearest neighbors to take:
k<-5
filtered<-data[dist<=threshold]
filtered<-filtered[,{list(dist=dist[1:k],n_id=id[1:k],n_age=age_id[1:k])},by=c("nrow","age")]
filtered<-filtered[!is.na(dist)]
setnames(filtered,"nrow","id")

filtered
id age dist n_id n_age
1: 1 30 25.37893 4 5
2: 1 30 52.27055 5 2
3: 1 30 69.37211 2 23
4: 1 30 213.41050 3 10
5: 2 23 26.31045 5 2
6: 2 23 48.08326 4 5
7: 2 23 69.37211 1 30
8: 2 23 147.12665 3 10
9: 3 10 147.12665 2 23
10: 3 10 172.11243 5 2
11: 3 10 194.93653 4 5
12: 3 10 213.41050 1 30
13: 4 5 25.37893 1 30
14: 4 5 27.32471 5 2
15: 4 5 48.08326 2 23
16: 4 5 194.93653 3 10
17: 5 2 26.31045 2 23
18: 5 2 27.32471 4 5
19: 5 2 52.27055 1 30
20: 5 2 172.11243 3 10
21: 6 66 67.84106 9 26
22: 6 66 209.88273 7 7
23: 7 7 180.54432 9 26
24: 7 7 209.88273 6 66
25: 8 55 805.91482 10 22
26: 9 26 67.84106 6 66
27: 9 26 180.54432 7 7
28: 10 22 805.91482 8 55


Related Topics



Leave a reply



Submit