Ruby Alternative for Lucene

Ruby alternative for Lucene

Well, there's Ferret, which is a port of Lucene to Ruby. Also, Lucene is very easy to use from JRuby, if that's an option for you.

Depending on your needs, you might also want to take a look at Solr, which is a higher-level front-end built on Lucene. There is a Ruby interface, solr-ruby, that interacts with Solr via HTTP.

Full Text Searching with Rails

  • thinking_sphinx and sphinx work beautifully, no indexing, query, install problems ever (5 or 6 install, including production slicehost )

  • why doesn't everybody use sphinx, like, say craigslist? read here about its limitations (year and a half old articles. The sphinx developer, Aksyonoff, is working on these and he's putting in features and reliability and stamping out bugs at an amazing pace)

http://codemonkey.ravelry.com/2008/01/09/sphinx-for-search/

http://www.ibm.com/developerworks/opensource/library/os-php-apachesolr/

Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

  • ferret: easy install, doesn't stem properly, very slow indexing (one mysql db: sphinx: 3 seconds, ferret: 50 minutes). Well documented problems (index corruption) in drb servers in production under load. Having said that, i have use it in develometn since acts-as_ferret came out 3 years ago, and it has served me well. Not adhering to porter stemming is an advantage in some contexts.

  • Lucene and Solr is the gorilla/mack truck / heavyweight champ of open source search. The teams have been doing an impressive number of new features in solr 14 release:

  • acts-as-solr: works well, once the tomcat or jetty is in place, but those sometimes are a pain. The A-A-S fork by mattmatt is the main fork, but the project is relatively unmaintained.

  • re the tomcat install: SOLR/lucene has unquestionably the best knowledge base/ support search engine of any software package i've seen ( i guess i'm not that surprised), the search box here:

http://www.lucidimagination.com/

  • Sunspot the new ruby wrapper, build on solr-ruby. Looks promising, but I couldn't get it to install on OSX. Indexes all ruby objects, not just databases through AR

  • one thing that's really instructive is to install 2 search plugins, e.g. sphinx and SOLR, sphinx and ferret, and see what different results they return. It's as easy as @sphinx_results - @ferret_results


just saw this post and responses

http://zooie.wordpress.com/2009/07/06/a-comparison-of-open-source-search-engines-and-indexing-twitter/

http://www.jroller.com/otis/entry/open_source_search_engine_benchmark

http://www.flax.co.uk/blog/2009/07/07/xapian-compared/

What's the best option for searching in Ruby on Rails?

Thinking Sphinx has more concise syntax to define which fields and which models are indexed.

Both UltraSphinx and Thinking Sphinx (recently) have ultra-cool feature which takes into account geographical proximity of objects.

UltraSphinx has annoying problems with how it loads models (it does not load entire Rails stack, so you could get strange and hard to diagnose errors, which are handled by adding explicit require statements).

We use Thinking Sphinx on new projects, and UltraSphinx on projects which use geo content.

Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

Good to see someone's chimed in about Lucene - because I've no idea about that.

Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.

  • Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
  • Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either.
  • I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though.
  • The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memory the indexer process uses too.
  • Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with.
  • There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches.
  • Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.

I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options (Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.

I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.

How do I do full-text searching in Ruby on Rails?

There are several options available and each have different strengths and weaknesses. If you would like to add full-text searching, it would be prudent to investigate each a little bit and try them out to see how well it works for you in your environment.

MySQL has built-in support for full-text searching. It has online support meaning that when new records are added to the database, they are automatically indexed and will be available in the search results. The documentation has more details.

acts_as_tsearch offers a wrapper for similar built-in functionality for recent versions of PostgreSQL

For other databases you will have to use other software.

Lucene is a popular search provider written in Java. You can use Lucene through its search server Solr with Rails using acts_as_solr.

If you don't want to use Java, there is a port of Lucene to Ruby called Ferret. Support for Rails is added using the acts_as_ferret plugin.

Xapian is another good option and is supported in Rails using the acts_as_xapian plugin.

Finally, my preferred choice is Sphinx using the Ultrasphinx plugin. It is extremely fast and has many options on how to index and search your databases, but is no longer being actively maintained.

Another plugin for Sphinx is Thinking Sphinx which has a lot of positive feedback. It is a little easier to get started using Thinking Sphinx than Ultrasphinx. I would suggest investigating both plugins to determine which fits better with your project.

Sphinx alternative without Rails

While certain plugins (like thinking_sphinx) are Rails-specific, Sphinx itself is just a search server, and you can use the client gem directly to index and search whatever you want; it need not be in ActiveRecord (which, incidentally, you could also use outside of Rails if you wished).

Another alternative is Apache Solr, which provides similar functionality to Sphinx, including a sophisticated search system (supporting stemming and lots of other nice things).



Related Topics



Leave a reply



Submit