JavaScript Fuzzy Search That Makes Sense

Javascript fuzzy search that makes sense

Good question! But my thought is that, rather than trying to modify Levenshtein-Demerau, you might be better to try a different algorithm or combine/ weight the results from two algorithms.

It strikes me that exact or close matches to the "starting prefix" are something Levenshtein-Demerau gives no particular weight to -- but your apparent user expectations would.

I searched for "better than Levenshtein" and, among other things, found this:

http://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/

This mentions a number of "string distance" measures. Three which looked particularly relevant to your requirement, would be:

  1. Longest Common Substring distance: Minimum number of symbols that have to be removed in both strings until resulting substrings are identical.

  2. q-gram distance: Sum of absolute differences between N-gram vectors of both strings.

  3. Jaccard distance: 1 minues the quotient of shared N-grams and all observed N-grams.

Maybe you could use a weighted combination (or minimum) of these metrics, with Levenshtein -- common substring, common N-gram or Jaccard will all strongly prefer similar strings -- or perhaps try just using Jaccard?

Depending on the size of your list/ database, these algorithms can be moderately expensive. For a fuzzy search I implemented, I used a configurable number of N-grams as "retrieval keys" from the DB then ran the expensive string-distance measure to sort them in preference order.

I wrote some notes on Fuzzy String Search in SQL. See:

  • http://literatejava.com/sql/fuzzy-string-search-sql/

Fuzzy search in JavaScript with results sorted by relevancy

It appears you are looking for fuzzy search functionality that returns results based on relevancy. Take a look at Fuse.js, it provides Solr-like searching against arrays and objects in JavaScript.

http://fusejs.io

Regex for best match

What it sounds like you're looking for is not regex, but a fuzzy string comparison. Try Javascript fuzzy search that makes sense for more information.

Fuzzy search in ElasticSearch doesn't work with spaces

Fuzzy queries are term level queries. It means searched text is not analyzed before matching the documents. In your case standard analyzer is used on field name, which splits "Pineapple Pizza" in two tokens Pineapple and pizza. Fuzzy query is trying to match search text "Pineapple pizza" to any similar term in index and there is no entry in index for the whole word pineapple pizza(it is broken in two words.)

You need to use match query with fuzziness set to analyze query string

{
"query": {
"match" : {
"item" : {
"query" : "Pineappl piz",
"fuzziness": "auto"
}
}
}
}

Response :

 [
{
"_index" : "index27",
"_type" : "_doc",
"_id" : "p9qQDG4BLLIhDvFGnTMX",
"_score" : 0.53372335,
"_source" : {
"item" : "Pineapple Pizza"
}
}
]

You can also use fuzziness on keyword field which stores entire text in index

{
"query": {
"fuzzy": {
"item.keyword": {
"value":"Pineapple pizz"
}
}
}
}

EDIT1:

{
"query": {
"match" : {
"item" : {
"query" : "Pineapple pizza",
"operator": "and",
"fuzziness": "auto"
}
}
}
}

"operator": "and" --> all the tokens in query must be present in document.
Default is OR , if any one token is present document is present. There are other possible combinations where you can define how many tokens should match in percent term



Related Topics



Leave a reply



Submit