Autocomplete with Java , Redis, Elastic Search , Mongo

Autocomplete with java , Redis, Elastic Search , Mongo

It's a critical search use case, and MongoDB and Redis are perfect for key-based lookups and not use for Search purposes, while Elasticsearch is a distributed search engine, built specifically for such use-case.

Before choosing the system, you should know how your feature works internally And below the consideration for selecting it.

Non-functional requirements for your feature

  1. What would be the total no of search queries per second (QPS)?
  2. How frequently you would be updating the documents(ie, names in your example).
  3. What is the SLA after names in updated and coming in the search result?
  4. SLA for your search results.

Some functional requirements.

  1. How autocomplete should look like, prefix, infix search on names?
  2. Minimum how many character user should type, before showing them the autocomplete results.
  3. How frequently the above requirements can change.

Elasticsearch indexed documents in the inverted index and works on
tokens match(which can be easily customized to suit business
requirements), hence super fast in searching. Redis and MongoDB are
not having this structure internally and shouldn't be used for this
use-case. You shouldn't have any doubt about choosing Elasticsearch over
these to implement Autocomplete.

ElasticSeach Auto Complete using Completion suggester to Return the Complete Document

Completion suggester will not return the whole document as its just a suggester and doesn't work like full-text search which returns the whole document.

Completion suggester will return the whole document as part of the suggest. You can control which all keys to be returned using the source in while querying.

Refer this link for information on how to extract the source fields using Java Client API.

If you want the whole document, then you implement autosuggest in full-text and there are various ways.

You can also refer https://stackoverflow.com/a/60584211/4039431 for more information on the functional and non-functional requirements to build autocomplete.

How to match when search term has more words than index?

Index Mapping:

{
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
"properties": {
"code": {
"type": "text",
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
}
}
}

Index data:

{
"code": "A1"
}
{
"code": "A1B"
}
{
"code": "A1B2"
}

Search Query:

{
"query": {
"match": {
"code": {
"query": "A1B2 2C8"
}
}
}
}

Search Result:

 "hits": [
{
"_index": "65067196",
"_type": "_doc",
"_id": "3",
"_score": 1.3486402,
"_source": {
"code": "A1B2"
}
}
]

Elastic Search autocomplete with a secondary search order

Sort order is not supported with Suggesters query. The whole point of Suggesters is speed and adding sort to the mix will slow it down.

You can add weight to your query to improve the ranking but you cannot add a secondary sort index. If you are looking for a secondary sort I will suggest you can use search

EDIT: Looking at the indexing code, you can add weight for your combined field as the population field.

`$data_array = array("city" => $row['city'],"county" => $row['county'],"region" => $row['region'],"country" => $row['country'],"url" => $row['url'],"combined" => array("input" => $row['combined'], "weight" => $row['Population']),"city_lc" => $row['city_lc'],"population" => $row['Population'],"location" => array("lat" => $row['lat'],"lon" => $row['long']));`

Elastic Search - document text search

Elasticsearch has its own storage(so no need to have MongoDB or other storage) and you can install it on any bare-metal machine(you just need to have a server in azure to install ES).

For partial search using Elasticsearch please refer to my this SO post for functional and non-functional requirements.

You can also refer to my detailed blog on various approaches to implement partial-search in ES.

Which NoSQL DB is advisable for multi-field search

Although not exactly the answer to your question, this should give you some idea on how to choose a system, based on functional and non-functional requirements.

Coming to your main requirements, yeah you need not to index data on which you are not doing any searches, using the index option, you can disable indexing of a particular field(Note default is true). This would reduce your inverted index size and improve the performance.

The second part is to use the filter context, as you don't have the full-text search requirements and Elasticsearch caches the data in the filter context, which makes it very fast, without introducing an external cache system like Redis, etc.

More information on filter cache can be found in https://www.elastic.co/guide/en/elasticsearch/reference/current/query-filter-context.html#filter-context
official doc, and quoting from the same doc:

Frequently used filters will be cached automatically by Elasticsearch,
to speed up performance.

Also,

In a filter context, a query clause answers the question “Does this
document match this query clause?” The answer is a simple Yes or
No — no scores are calculated. Filter context is mostly used for
filtering structured data, e.g.

Does this timestamp fall into the range 2015 to 2016? Is the status
field set to "published"

Which seems to be your exact requirements.

Elasticsearch: index boost with completion suggester

Tried few options but these didn't work directly, instead, you can define the weight of a document at index-time, and these could be used as a workaround to get the boosted document, below is the complete example.

Index mapping same for index1, index2

{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}

Index doc 1 with weight in index-1

{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}

Similar doc is inserted in index-2 with diff weight

{
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10 --> note less weight
}
}

And the simple search will now sort it according to weight

{
"suggest": {
"song-suggest": {
"prefix": "nir",
"completion": {
"field": "suggest"
}
}
}
}

And search result

    {
"text": "Nirvana",
"_index": "index-1",
"_type": "_doc",
"_id": "1",
"_score": 34.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 30
}
}
},
{
"text": "Nirvana",
"_index": "index-2",
"_type": "_doc",
"_id": "1",
"_score": 30.0,
"_source": {
"suggest": {
"input": [
"Nevermind",
"Nirvana"
],
"weight": 10
}
}
}
]


Related Topics



Leave a reply



Submit