How to find the closest pairs (Hamming Distance) of a string of binary bins in Ruby without O^2 issues?
I ended up doing a retrieval of all the documents into memory.. (subset with the id and the string).
Then, I used a BK Tree to compare the strings.
retrieve closest element from a set of elements
You can store a hash table (dictionary/map) that maps from an element (in the tupple) to the tupples it appears in: hash:element->list<tupple>
.
Now, when you have a new "query", you will need to iterate each of hash(element)
for each element of the new "query", and find the maximal number of hits.
pseudo code:
findMax(tuple):
histogram <- empty map
for each element in tuple:
#assuming hash_table is the described DS from above
for each x in hash_table[element]:
histogram[x]++ #assuming lazy initialization to 0
return key with highest value in histogram
An alternative, that does not exactly follow the metric you desired is a k-d tree. The difference is k-d tree also take into consideration the "distance" between the elements (and not only equality/inequality).
k-d trees also require the elements to be comparable.
How to calculate Hemming Distance in CosmosDB?
To solve this I took code from long.js and ImageHash for using in CosmosDB UDF. All cudos to their authors.
See gist it here https://gist.github.com/okolobaxa/55cc08a0d67bc60505bfe3ab4f8bc33c
Usage:
SELECT udf.HAMMING_DISTANCE(files.ContentId, '1279796919517872320') FROM files
But please note a few things:
- CosmosDB doesn't support 64-bit numbers as numbers, you have to
store them as strings. - Using this UDF costs a lot of RUs
I created a feature request on the CosmosDB Feedback forum to add built-in support of such functions. Please vote for these ideas if you're interested in it too:
Built-in functions for bitwise operations
Built-in functions for calculating distance metrics
Related Topics
A Selenium Webdriver Exception
What Rails Plugins Are Good, Stable and *Really* Enhance Your Code
How to Get a Remote-File's Mtime Before Downloading It in Ruby
Safe Navigation Equivalent to Rails Try for Hashes
Add_Foreign_Key VS Add_Reference in Rails
Rspec: How to Write a Test That Expects Certain Output But Doesn't Care About the Method
Convert a Partial to Method/Block for Speed
Rails 3 - Best Way to Handle Nested Resource Queries in Your Controllers
Using Activerecord Interface for Models Backed by External API in Ruby on Rails
Rails: Difference Between Env.Fetch() and Env[]
Installing Nokogiri MAC Os X 10.8.2 Xcode Installed
Elegantly Selecting Attributes from Has_Many :Through Join Models in Rails
Decoding Facebook's Signed Request in Ruby/Sinatra
Can Activerecord Connect to Postgresql Remotely and Protect the Db Password