Find K Nearest Neighbors, Starting from a Distance Matrix

Implementing k nearest neighbours from distance matrix?

The way I see it, I simply find n + 1 smallest numbers/distances/neighbours for each row and remove the 0, which would then give you n numbers/distances/neighbours. Keep in mind that the code will not work if you have a distance of zeroes! Only the diagonals are allowed to be 0.

import pandas as pd
import numpy as np



X = pd.DataFrame([[0, 1, 3, 2],[5, 0, 2, 2],[3, 2, 0, 1],[2, 3, 4, 0]])

X.columns = ['A', 'B', 'C', 'D']
X.index = ['A', 'B', 'C', 'D']

X = X.T

for i in X.index:

    Y = X.nsmallest(3, i)
    Y = Y.T
    Y = Y[Y.index.str.startswith(i)]
    Y = Y.loc[:, Y.any()]

    for j in Y.index:
        print(i + ": ", list(Y.columns))

This prints out:

A:  ['B', 'D']
B:  ['C', 'D']
C:  ['D', 'B']
D:  ['A', 'B']

Python skicit-learn k-nearest neighbors - 3D distance matrix

You can use a pre-computed distance matrix as your input to sklearn's neighbours.NearestNeighbors by setting the metrics parameter as "precomputed"

Lets create a dummy distance matrix between 6 points in some 3D space (or any dimensional space).

from sklearn.neighbors import NearestNeighbors

#Distance matrix from numpy (dummy)
precomputed_distances = np.random.random((6,6)) 

#Get top 5 neighbours from precomputed distance matrix
nn = NearestNeighbors(n_neighbors=5, metric='precomputed')
nn.fit(precomputed_distances)

#Fetch kneighbors
distances, indexes = nn.kneighbors()

print(indexes)
print('')
print(distances)

#neighbours indexes
[[2 5 3 1 4]
 [0 4 3 2 5]
 [5 3 0 1 4]
 [1 2 4 0 5]
 [3 1 2 5 0]
 [3 2 0 1 4]]

#distances
[[0.07355072 0.30327092 0.32645641 0.54227088 0.76145093]
 [0.06451358 0.13867276 0.7570105  0.84383876 0.92184049]
 [0.52953184 0.59474913 0.63211483 0.80958676 0.99361867]
 [0.10885239 0.31822021 0.39327313 0.47670755 0.6764581 ]
 [0.18309627 0.69483384 0.74029263 0.82705113 0.92923248]
 [0.28584336 0.42956108 0.43323451 0.64124948 0.90154176]]

kNN - How to locate the nearest neighbors in the training matrix based on the calculated distances

I will suggest to use the python library sklearn that has a KNeighborsClassifier from which, once fitted, you can retrieve the nearest neighbors you are looking for :

Try this out:

# Import
from sklearn.neighbors import KNeighborsClassifier

# Instanciate your classifier
neigh = KNeighborsClassifier(n_neighbors=4) #k=4 or whatever you want
# Fit your classifier
neigh.fit(X, y) # Where X is your training set and y is the training_output
# Get the neighbors
neigh.kneighbors(X_test, return_distance=False) # Where X_test is the sample or array of samples from which you want to get the k-nearest neighbors

Nearest Neighbors in Python given the distance matrix

You'll want to create a DistanceMetric object, supplying your own function as an argument:

metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)

From the docs:

Here func is a function which takes two one-dimensional numpy arrays,
and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties

Non-negativity: d(x, y) >= 0

Identity: d(x, y) = 0 if and only if x == y

Symmetry: d(x, y) = d(y, x)

Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)

You can then create your classifier with metric=metric as a keyword argument and it will use this when calculating distances.

5 nearest neighbors based on given distance in r

data.table solution:

library(data.table)
data<-fread("id    x         y       age
1  1745353   930284.1    30
            2  1745317   930343.4    23
            3  1745201   930433.9    10
            4  1745351   930309.4    5
            5  1745342   930335.2    2
            6  1746619   929969.7    66
            7  1746465   929827.1    7
            8  1746731   928779.5    55
            9  1746629   929902.6    26
            10 1745938   928923.2    22")

data[,all_x:=list(list(x))]
data[,all_y:=list(list(y))]
data[,all_age:=list(list(age))]
data[,seq_nr:=seq_len(.N)]

#Distance formula:
formula_distance<-function(x_1,x_2,y_1,y_2,z){
  x_2<-x_2[[1]][-z]
  y_2<-y_2[[1]][-z]
  sqrt((x_1-x_2)^2+(y_1-y_2)^2)
}

data<-data[,{list(dist = formula_distance(x,all_x,y,all_y,seq_nr), 
                  id =seq(1:nrow(data))[-id],
                  age_id=all_age[[1]][-id],
                  age=rep(age,nrow(data)-1))},by=1:nrow(data)]
data<-data[order(nrow,dist)]
#Filter data within threshold:
threshold<-1000

#How many nearest neighbors to take:
k<-5
filtered<-data[dist<=threshold]
filtered<-filtered[,{list(dist=dist[1:k],n_id=id[1:k],n_age=age_id[1:k])},by=c("nrow","age")]
filtered<-filtered[!is.na(dist)]
setnames(filtered,"nrow","id")

filtered
    id age      dist n_id n_age
 1:  1  30  25.37893    4     5
 2:  1  30  52.27055    5     2
 3:  1  30  69.37211    2    23
 4:  1  30 213.41050    3    10
 5:  2  23  26.31045    5     2
 6:  2  23  48.08326    4     5
 7:  2  23  69.37211    1    30
 8:  2  23 147.12665    3    10
 9:  3  10 147.12665    2    23
10:  3  10 172.11243    5     2
11:  3  10 194.93653    4     5
12:  3  10 213.41050    1    30
13:  4   5  25.37893    1    30
14:  4   5  27.32471    5     2
15:  4   5  48.08326    2    23
16:  4   5 194.93653    3    10
17:  5   2  26.31045    2    23
18:  5   2  27.32471    4     5
19:  5   2  52.27055    1    30
20:  5   2 172.11243    3    10
21:  6  66  67.84106    9    26
22:  6  66 209.88273    7     7
23:  7   7 180.54432    9    26
24:  7   7 209.88273    6    66
25:  8  55 805.91482   10    22
26:  9  26  67.84106    6    66
27:  9  26 180.54432    7     7
28: 10  22 805.91482    8    55