What Is Full Text Search VS Like

What is Full Text Search vs LIKE

In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.

Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.

The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.

Other features typical of full-text search are

  • lexical analysis or tokenization—breaking a
    block of unstructured text into
    individual words, phrases, and
    special tokens
  • morphological
    analysis, or stemming—collapsing variations
    of a given word into one index term;
    for example, treating "mice" and
    "mouse", or "electrification" and
    "electric" as the same word
  • ranking—measuring the
    similarity of a matching record to
    the query string

SQL full text search vs LIKE

Full text search is likely to be quicker since it will benefit from an index of words that it will use to look up the records, whereas using LIKE is going to need to full table scan.

In some cases LIKE will more accurate since LIKE "%The%" AND LIKE "%Matrix" will pick out "The Matrix" but not "Matrix Reloaded" whereas full text search will ignore "The" and return both. That said both would likely have been a better result.

Fulltext search vs standard database search

There's a few advantages to full text searching.

Indexing:

Something like:

WHERE Foo LIKE '%Bar';

Cannot take advantage of an index. It has to look at every single row, and see if it matches. A fulltext index, however, can. In fact, fulltext indexes can offer a lot more flexibility in terms of the order of matching words, how close those words are together, etc.

Stemming:

A fulltext search can stem words. If you search for run, you can get results for "ran" or "running". Most fulltext engines have stem dictionaries in a variety of languages.

Weighted Results:

A fulltext index can encompass multiple columns. For example, you can search for "peach pie", and the index can include a title, keywords, and a body. Results that match the title can be weighted higher, as more relevant, and can be sorted to show near the top.

Disadvantages:

A fulltext index can potentially be huge, many times larger than a standard B-TREE index. For this reason, many hosted providers who offer database instances disable this feature, or at least charge extra for it. For example, last I checked, Windows Azure did not support fulltext queries.

Fulltext indexes can also be slower to update. If the data changes a lot, there might be some lag updating indexes compared to standard indexes.

Performance of like '%Query%' vs full text search CONTAINS query

Full Text Searching (using the CONTAINS) will be faster/more efficient than using LIKE with wildcarding. Full Text Searching (FTS) includes the ability to define Full Text Indexes, which FTS can use. I don't know why you wouldn't define a FTS index if you intended to use the functionality.

LIKE with wildcarding on the left side (IE: LIKE '%Search') can not use an index (assuming one exists for the column), guaranteeing a table scan. I haven't tested & compared, but regex has the same pitfall. To clarify, LIKE '%Search' and LIKE '%Search%' can not use an index; LIKE 'Search%' can use an index.

MySQL Fulltext vs Like

If you're simply going to use a series of LIKEs, then I'd have thought it would make sense to make use of a FULLTEXT index, the main reason being that it would let you use more complex boolean queries in the future. (As @Quassnoi states, you can simply create an index if you don't have a use for a specific field.)

However, it should be noted that fulltext has its limitations - words that are common across all rows have a low "score" and hence won't match as prominently as if you'd carried out a series of LIKEs. (On the flipside, you can of course get a "score" back from a FULLTEXT query, which may be of use depending on how you want to rank the results.)

Postgresql ILIKE versus TSEARCH

A full text search setup is not identical to a "contains" like query. It stems words etc so you can match "cars" against "car".

If you really want a fast ILIKE then no standard database index or FTS will help. Fortunately, the pg_trgm module can do that.

  • http://www.postgresql.org/docs/9.1/static/pgtrgm.html
  • http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

MySQL Full-Text search vs Like %%

Full-text search is the kind of search based on special sort of index (full-text, obviously). So you get the power of O(lgN) while searching using it.

While like %% always causes table fullscan which can be terrible slow (when you have 100k and more rows).

Personally I use Like %% when it is a small table (0-1000 rows) and I'm sure that it will never grow (and, important, when like %% fits the task requirements).

Note: fulltext indexes are available only for myisam SE. If you use innodb - then you need to look at some 3rd party indexing software like sphinx



Related Topics



Leave a reply



Submit