All K Nearest Neighbors in 2D, C++

C++ Data Structure for k Nearest Neighbour Search in D dimension using only point cloud as query points

I doubt that there is a complete and definite answer to your very complex problem, so I just share my thoughts.
Your problem specification combines a number of things that don't work well together (high dimension, non-Euclidean metric, completely different types of queries). If an algorithm has to assume the generic case, it is necessarily slow.

Let's first sort out the special cases where good data structures are known.

  • If your dimension is 1, use a sorted map.
  • If your dimension is 2-3 (perhaps even 4), sorted lookups and geographic databases should be optimal.
    https://en.wikipedia.org/wiki/R-tree
  • If your points have a higher dimension but very strong correlation, dimensionality reduction can possibly map your point cloud to one that has such low dimension and reduce the problem to an easy one.
    https://en.wikipedia.org/wiki/Dimensionality_reduction
  • If your number of points is lower than 10^6, brute force is cheapest. Just calculate the distance with your metric for all points, then do a partial sort for k results. These simple cache-coherent calculations are faster than using tree structures.
    http://en.cppreference.com/w/cpp/algorithm/partial_sort
  • If your k is bounded, say k <= 20, and you optimize for query times, precompute a table with all results.
  • If only few of your dimensions are periodic, I think you should adapt the kd-tree algorithm to handle them (adding more complex comparison nodes for those dimensions similar to those in Vantage Point Trees).

If all this does not apply (if you have a practical application in mind, please share with us), you case is very generic.

In addition to the algorithms you mentioned, you should also try out Geometric Near-Neighbor Access trees (GNAT).
http://infolab.stanford.edu/~sergey/near.html
They apply to generic metrics (including yours) and also handle non-uniform distributions.

Also, I think your expectations are very high. You could compare to a good kd-tree implementation (for instance, https://github.com/mariusmuja/flann) that solves the problem with a Euclidean metric only. If that takes long, you should not expect more general metrics to solve faster.

Admittedly, more generic method cannot use your constraint that queries are points in the cloud. I would be very interested in if there is any such solution.

Spatial querying for k-th closest neighbour of a point

I'm thinking of using k-d trees

Excellent choice for 2D or 3D.

k-d trees are a good choice for low dimensional data (which I assume you have since nanoflann is "mostly optimized for 2D or 3D point clouds.").

Is there a more efficient way to get what I need, other than slogging through all the k nearest neighbors every time?

You need the k-th Nearest Neighbor (NN), but when searching a k-d tree for k NNs, the costly operation (in terms of time) is finding the first NN (which requires you to descend down the tree, all way from the root to a leaf).

Finding the 2nd, 3rd or another indexed NN is relatively cheap, and I highly doubt that it will harm performance (i.e. that getting the k-th NN out of the k NNs returned by the tree will be the bottleneck).

So, I strongly suggest you not to worry about that step.

A better data structure or a better implementation?

I don't think so. I haven't used nanoflann, but CGAL for this kind of queries, and it worth giving a try (however CGAL requires installing (not a piece of cake), while nanoflann is just a header include).

2D nearest neighbour search for moving points

Maybe you want to try a quadtree or a spatial index? What's the problem with a k-d tree? Basically when have the edge the flock/points you can skip checking collision with edges far away. A spatial index can be a quadtree, r-tree, kd-tree or hilbert r-tree. A better answer can be read here: Approximate, incremental nearest-neighbour algorithm for moving bodies

"That is, recursively partition the "world" into a graph with four subnodes each. The tree can then quickly check which objects are inside a particular square of the world and discard the rest. A very effective culling technique often used for improving performance of collision detection in games."

Efficiently find the n nearest neighbours of 50k 2D coordinates?

There are a couple of solutions depending on your data (which you have told nothing about) and how sure you want to be.

  • if your data is uniformly distributed, then you can create a grid on top of your data and assign points to the grid. After that for every element you find out which grid it belongs to and compare distances in that grid (and in the closest grids). With good grid selection and assuming there are k elements in the grid on average this can give you a potential O(n * k^2) running time. Look at this answer for more explanation.
  • knowing nothing about the data, you can construct a 2-d tree in O(n log n) time and then for every point in your database to ask what is the closest point to it (you ask this in O(logn) for a total n points). So the total complexity is O(n log n )
  • another approach is to use probabilistic approach called local sensitivity hashing. The wiki page is too convoluted and even knowing what is this, it was hard for me to read that page. Take a look at this description to better understand it.
  • @Gene suggested another approach by using a quadtree (have not heard about this kind of tree, so will just leave it empty)

So as you see it is possible to get better than O(n^2) complexity for this task.

All the methods are describing how to find the closest point to a point you are searching for. It is obvious that after you found the closest point, you can remove it and find another closest point and so on till you found 5 closest point for your point.



Related Topics



Leave a reply



Submit