Sunspot 'Like' Query

Sunspot `LIKE` query

If you use the standard DisMax handler, it does not support wildcards. You have 2 options:

a. Activate EdgeNGramFilter:

<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer type="index">
    ..
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15" side="front"/>
    ..
  </analyzer>
</fieldType>

b. Use nightly build Solr with EDismax Handler.

See wiki article on sunspot docs or similar question on SO.

Sunspot Solr Search like Rails active record 'LIKE' search

Update2

Fixed! Seems Solr shows only 30 entries by default, so it showed only the most relevant matches and skipped the other I also wanted.

I added this file

myapp/config/initializers/sunspot_solr.rb

Sunspot.config.pagination.default_per_page = 3000

How do I dynamically build a search block in sunspot?

I have solved this myself. The solution I used was to compiled the required scopes as strings, concatenate them, and then eval them inside the search block.

This required a separate query builder library that interrogates the solr indexes to ensure that a scope is not created for a non existent index field.

The code is very specific to my project, and too long to post in full, but this is what I do:

1. Split the search terms

this gives me an array of the terms or terms plus fields:

['field:term', 'non field terms']

2. This is passed to the query builder.

The builder converts the array to scopes, based on what indexes are available. This method is an example that takes the model class, field and value and returns the scope if the field is indexed.

def convert_text_query_to_search_scope(model_clazz, field, value)
  if field_is_indexed?(model_clazz, field)
    escaped_value = value.gsub(/'/, "\\\\'")
    "keywords('#{escaped_value}', :fields => [:#{field}])"
  else
    ""
  end
end

3. Join all the scopes

The generated scopes are joined join("\n") and that is evaled.

This approach allows the user to selected the models they want to search, and optionally to do field specific searching. The system will then only search the models with any specified fields (or common fields), ignoring the rest.

The method to check if the field is indexed is:

# based on http://blog.locomotivellc.com/post/6321969631/sunspot-introspection
def field_is_indexed?(model_clazz, field)
  # first part returns an array of all indexed fields - text and other types - plus ':class'
  Sunspot::Setup.for(model_clazz).all_field_factories.map(&:name).include?(field.to_sym)
end

And if anyone needs it, a check for sortability:

def field_is_sortable?(classes_to_check, field)
  if field.present?
    classes_to_check.each do |table_clazz|
      return false if ! Sunspot::Setup.for(table_clazz).field_factories.map(&:name).include?(field.to_sym)
    end
    return true
  end
  false
end

Sunspot: how to do a fulltext query on multiple fields with different values?

In Sunspot 2.0.0, there is undocumented and unsupported behaviour that does work. The author himself suggests it shouldn't and it probably won't in future versions.

You can pass multiple fulltext calls into the search definition

Post.search do
    fulltext "foo", {:fields => :exact_term}
    fulltext "bar", {:fields => :alternate}
end

This results in a solr query of (from the logs)

INFO: [] webapp=/solr path=/select 
params={fl=*+score&start=0&q=_query_:"{!dismax+qf%3D'exact_term'}foo"+_query_:"{!dismax+qf%3D'alternate'}bar"&wt=ruby&fq=type:Post&rows=30}
hits=1 status=0 QTime=7

Matching substrings is covered in
https://github.com/sunspot/sunspot/wiki/Matching-substrings-in-fulltext-search

Changing the default operator (AND/OR) can be done by adding a option
minimum_match 1
as mentioned in http://blog.tonycode.com/archives/192

Solr (sunspot) Query with Hyphen and Stop Words

The hyphen character (-) is a Solr operator used to exclude results matching the word that follows the operator. I don't think adding a hyphen to the stop words list would affect that. I would suggest stripping the hyphens out before running the query through Solr. My guess is what is happening is that the result with the hyphen is excluding documents that match "bar"? Perhaps you could try faceting the results to see if that is in fact the case.

Sunspot -- Boost records where matches occur early in the text

I had a similar problem to solve. So I stored my data in many fields:

title
keywords (upto 10 words)
abstract (a paragraph)
text (as long as you like)

For querying, I used the dismax query parser over the fields with different weights:

title^20
keywords^20
abstract^12
text^1

So if you

define your data schema well
use dismax
determine per-field weights for your queries

when you search "Hormel Corned Beef 16 Ounces", a result whose title is "Hormel Corp" will score better a document whose body contains "...For the dish, we reccomend a can of Hormel Corned Beef 16 Ounces..."

Edit on OP's comments.

OP's fact is: given a title of n words, the first n words matter more than the rest.

I suggest a data model in which there are two fields: title_first_words and title. The client application (sorry, you cannot directly use DIH) will have to extract the first n words from title to store into title_first_words and the full title is stored to title.

For searching, you can give the entire query to the dismax parser. The query parser is theb biased to title_first_words like title_first_words^4 title^1. Thus the first n words will make a bigger impact for a given search.

sunspot solr attribute substring/wildcard searches

You can have the partial matches working for String as well Text fields for wildcards.

If the query parsers you are using, supports leading wildcard queries, you can easily search for *456*, and this should match 1234567890.

However, EdgeNGram would only work for solr.TextField, as solr.strField do not allow analysers to be added to it.

So you can only define fields with class as solr.TextField and have the EdgeNGram in the analysis chain, which would break down the indexed terms into shingles for partial matching.

Sunspot 'Like' Query