Elasticsearch & Tire: Using Mapping and To_Indexed_JSON

ElasticSearch & Tire: Using Mapping and to_indexed_json

While the mapping and to_indexed_json methods are related, they serve two different purposes, in fact.

The purpose of the mapping method is to define mapping for the document properties within an index. You may want to define certain property as "not_analyzed", so it is not broken into tokens, or set a specific analyzer for the property, or (as you mention) indexing time boost factor. You may also define multifield property, custom formats for date types, etc.

This mapping is then used eg. when Tire automatically creates an index for your model.

The purpose of the to_indexed_json method is to define a JSON serialization for your documents/models.

The default to_indexed_json method does use your mapping definition, to use only properties defined in the mapping — on a basis that if you care enough to define the mapping, by default Tire indexes only properties with defined mapping.

Now, when you want a tight grip on how your model is in fact serialized into JSON for elasticsearch, you just define your own to_indexed_json methods (as the README instructs).

This custom MyModel#to_indexed_method usually does not care about mapping definition, and builds the JSON serialization from scratch (by leveraging ActiveRecord's to_json, using a JSON builder such as jbuilder, or just building a plain old Hash and calling Hash#to_json).

So, to answer the last part of your question, using both mapping and to_indexed_json will absolutely not create any conflicts, and is in fact required to use advanced features in elasticsearch.

To sum up:

  1. You use the mapping method to define the mapping for your models for the search engine
  2. You use a custom to_indexed_json method to define how the search engine sees your documents/models.

Elasticsearch, Tire, and Nested queries / associations with ActiveRecord

The support for ActiveRecord associations in Tire is working, but requires couple of tweaks inside your application. There's no question the library should do better job here, and in the future it certainly will.

That said, here is a full-fledged example of Tire configuration to work with Rails' associations in elasticsearch: active_record_associations.rb

Let me highlight couple of things here.

Touching the parent

First, you have to ensure you notify the parent model of the association about changes in the association.

Given we have a Chapter model, which “belongs to” a Book, we need to do:

class Chapter < ActiveRecord::Base
belongs_to :book, touch: true
end

In this way, when we do something like:

book.chapters.create text: "Lorem ipsum...."

The book instance is notified about the added chapter.

Responding to touches

With this part sorted, we need to notify Tire about the change, and update the elasticsearch index accordingly:

class Book < ActiveRecord::Base
has_many :chapters
after_touch() { tire.update_index }
end

(There's no question Tire should intercept after_touch notifications by itself, and not force you to do this. It is, on the other hand, a testament of how easy is to work your way around the library limitations in a manner which does not hurt your eyes.)

Proper JSON serialization in Rails < 3.1

Despite the README mentions you have to disable automatic "adding root key in JSON" in Rails < 3.1, many people forget it, so you have to include it in the class definition as well:

self.include_root_in_json = false

Proper mapping for elasticsearch

Now comes the meat of our work -- defining proper mapping for our documents (models):

mapping do
indexes :title, type: 'string', boost: 10, analyzer: 'snowball'
indexes :created_at, type: 'date'

indexes :chapters do
indexes :text, analyzer: 'snowball'
end
end

Notice we index title with boosting, created_at as "date", and chapter text from the associated model. All the data are effectively “de-normalized” as a single document in elasticsearch (if such a term would make slight sense).

Proper document JSON serialization

As the last step, we have to properly serialize the document in the elasticsearch index. Notice how we can leverage the convenient to_json method from ActiveRecord:

def to_indexed_json
to_json( include: { chapters: { only: [:text] } } )
end

With all this setup in place, we can search in properties in both the Book and the Chapter parts of our document.

Please run the active_record_associations.rb Ruby file linked at the beginning to see the full picture.

For further information, please refer to these resources:

  • https://github.com/karmi/railscasts-episodes/commit/ee1f6f3
  • https://github.com/karmi/railscasts-episodes/commit/03c45c3
  • https://github.com/karmi/tire/blob/master/test/models/active_record_models.rb#L10-20

See this StackOverflow answer: ElasticSearch & Tire: Using Mapping and to_indexed_json for more information about mapping / to_indexed_json interplay.

See this StackOverflow answer: Index the results of a method in ElasticSearch (Tire + ActiveRecord) to see how to fight n+1 queries when indexing models with associations.

Elasticsearch + Tire + attachment-mapper + Paperclip = No Hits

I had a few dumb typos, that was messing things up. I must have read an article where they wrote up the to_indexed_json function in a different format and I got confused. I fixed it right before I wrote up the question so this is what i had before.

 def to_indexed_json  
{
:title => title,
:description => description,
:categories => categories.map { |c| { :name => c.name}},
:subcategories => subcategories.map { |s| { :name => s.name}},
:entry_type => entry_type_name,
:methods => [:attachment]
}.to_json
end

how to set index : not_analyzed globally for elastic search

You need to use a dynamic_template when creating your index. With the dynamic strings mapping below, all new string fields that will be created dynamically will be not_analyzed

PUT my_index
{
"mappings": {
"user": {
"_index": {
"enabled": true
},
"_id": {
"store": "yes"
},
"dynamic_templates": [
{
"strings": {
"match_mapping_type": "string",
"match": "*",
"mapping": {
"type": "string",
"index": "not_analyzed"
}
}
}
],
"properties": {
"id": {
"type": "long"
},
"name": {
"type": "string",
"index": "not_analyzed"
},
"presentValue": {
"type": "string",
"index": "not_analyzed"
},
"dateOfBirth": {
"type": "date"
}
}
}
}
}


Related Topics



Leave a reply



Submit