Implementing k nearest neighbours from distance matrix?
The way I see it, I simply find n + 1
smallest numbers/distances/neighbours for each row and remove the 0, which would then give you n
numbers/distances/neighbours. Keep in mind that the code will not work if you have a distance of zeroes! Only the diagonals are allowed to be 0.
import pandas as pd
import numpy as np
X = pd.DataFrame([[0, 1, 3, 2],[5, 0, 2, 2],[3, 2, 0, 1],[2, 3, 4, 0]])
X.columns = ['A', 'B', 'C', 'D']
X.index = ['A', 'B', 'C', 'D']
X = X.T
for i in X.index:
Y = X.nsmallest(3, i)
Y = Y.T
Y = Y[Y.index.str.startswith(i)]
Y = Y.loc[:, Y.any()]
for j in Y.index:
print(i + ": ", list(Y.columns))
This prints out:
A: ['B', 'D']
B: ['C', 'D']
C: ['D', 'B']
D: ['A', 'B']
Python skicit-learn k-nearest neighbors - 3D distance matrix
You can use a pre-computed distance matrix as your input to sklearn's neighbours.NearestNeighbors
by setting the metrics parameter as "precomputed"
Lets create a dummy distance matrix between 6 points in some 3D space (or any dimensional space).
from sklearn.neighbors import NearestNeighbors
#Distance matrix from numpy (dummy)
precomputed_distances = np.random.random((6,6))
#Get top 5 neighbours from precomputed distance matrix
nn = NearestNeighbors(n_neighbors=5, metric='precomputed')
nn.fit(precomputed_distances)
#Fetch kneighbors
distances, indexes = nn.kneighbors()
print(indexes)
print('')
print(distances)
#neighbours indexes
[[2 5 3 1 4]
[0 4 3 2 5]
[5 3 0 1 4]
[1 2 4 0 5]
[3 1 2 5 0]
[3 2 0 1 4]]
#distances
[[0.07355072 0.30327092 0.32645641 0.54227088 0.76145093]
[0.06451358 0.13867276 0.7570105 0.84383876 0.92184049]
[0.52953184 0.59474913 0.63211483 0.80958676 0.99361867]
[0.10885239 0.31822021 0.39327313 0.47670755 0.6764581 ]
[0.18309627 0.69483384 0.74029263 0.82705113 0.92923248]
[0.28584336 0.42956108 0.43323451 0.64124948 0.90154176]]
Read more about this here.
kNN - How to locate the nearest neighbors in the training matrix based on the calculated distances
I will suggest to use the python library sklearn
that has a KNeighborsClassifier
from which, once fitted, you can retrieve the nearest neighbors you are looking for :
Try this out:
# Import
from sklearn.neighbors import KNeighborsClassifier
# Instanciate your classifier
neigh = KNeighborsClassifier(n_neighbors=4) #k=4 or whatever you want
# Fit your classifier
neigh.fit(X, y) # Where X is your training set and y is the training_output
# Get the neighbors
neigh.kneighbors(X_test, return_distance=False) # Where X_test is the sample or array of samples from which you want to get the k-nearest neighbors
Nearest Neighbors in Python given the distance matrix
You'll want to create a DistanceMetric
object, supplying your own function as an argument:
metric = sklearn.neighbors.DistanceMetric.get_metric('pyfunc', func=func)
From the docs:
Here
func
is a function which takes two one-dimensional numpy arrays,
and returns a distance. Note that in order to be used within the BallTree, the distance must be a true metric: i.e. it must satisfy the following properties
- Non-negativity: d(x, y) >= 0
- Identity: d(x, y) = 0 if and only if x == y
- Symmetry: d(x, y) = d(y, x)
- Triangle Inequality: d(x, y) + d(y, z) >= d(x, z)
You can then create your classifier with metric=metric
as a keyword argument and it will use this when calculating distances.
5 nearest neighbors based on given distance in r
data.table solution:
library(data.table)
data<-fread("id x y age
1 1745353 930284.1 30
2 1745317 930343.4 23
3 1745201 930433.9 10
4 1745351 930309.4 5
5 1745342 930335.2 2
6 1746619 929969.7 66
7 1746465 929827.1 7
8 1746731 928779.5 55
9 1746629 929902.6 26
10 1745938 928923.2 22")
data[,all_x:=list(list(x))]
data[,all_y:=list(list(y))]
data[,all_age:=list(list(age))]
data[,seq_nr:=seq_len(.N)]
#Distance formula:
formula_distance<-function(x_1,x_2,y_1,y_2,z){
x_2<-x_2[[1]][-z]
y_2<-y_2[[1]][-z]
sqrt((x_1-x_2)^2+(y_1-y_2)^2)
}
data<-data[,{list(dist = formula_distance(x,all_x,y,all_y,seq_nr),
id =seq(1:nrow(data))[-id],
age_id=all_age[[1]][-id],
age=rep(age,nrow(data)-1))},by=1:nrow(data)]
data<-data[order(nrow,dist)]
#Filter data within threshold:
threshold<-1000
#How many nearest neighbors to take:
k<-5
filtered<-data[dist<=threshold]
filtered<-filtered[,{list(dist=dist[1:k],n_id=id[1:k],n_age=age_id[1:k])},by=c("nrow","age")]
filtered<-filtered[!is.na(dist)]
setnames(filtered,"nrow","id")
filtered
id age dist n_id n_age
1: 1 30 25.37893 4 5
2: 1 30 52.27055 5 2
3: 1 30 69.37211 2 23
4: 1 30 213.41050 3 10
5: 2 23 26.31045 5 2
6: 2 23 48.08326 4 5
7: 2 23 69.37211 1 30
8: 2 23 147.12665 3 10
9: 3 10 147.12665 2 23
10: 3 10 172.11243 5 2
11: 3 10 194.93653 4 5
12: 3 10 213.41050 1 30
13: 4 5 25.37893 1 30
14: 4 5 27.32471 5 2
15: 4 5 48.08326 2 23
16: 4 5 194.93653 3 10
17: 5 2 26.31045 2 23
18: 5 2 27.32471 4 5
19: 5 2 52.27055 1 30
20: 5 2 172.11243 3 10
21: 6 66 67.84106 9 26
22: 6 66 209.88273 7 7
23: 7 7 180.54432 9 26
24: 7 7 209.88273 6 66
25: 8 55 805.91482 10 22
26: 9 26 67.84106 6 66
27: 9 26 180.54432 7 7
28: 10 22 805.91482 8 55
Related Topics
How to Change the Color Value of Just One Value in Ggplot2's Scale_Fill_Brewer
Show Frequencies Along with Barplot in Ggplot2
Get_Map Not Passing the API Key (Http Status Was '403 Forbidden')
Reordering Factor Gives Different Results, Depending on Which Packages Are Loaded
Showing String in Formula and Not as Variable in Lm Fit
R Knitr: Possible to Programmatically Modify Chunk Labels
Efficient Row-Wise Operations on a Data.Table
Removing Na Observations with Dplyr::Filter()
Without Root Access, Run R with Tuned Blas When It Is Linked with Reference Blas
How to Show a Legend on Dual Y-Axis Ggplot
Same Function Over Multiple Data Frames in R
Exact Number of Bins in Histogram in R
Struggling with Integers (Maximum Integer Size)
How to Get Ranks with No Gaps When There Are Ties Among Values
Expand Spacing Between Tick Marks on X Axis