Pl/SQL Performance Tuning for Like '%...%' Wildcard Queries

PL/SQL Performance Tuning for LIKE '%...%' Wildcard Queries

As already mentioned you could add a ctx context index to the name columns.

assuming a small number of records get updated, 1 option is to refresh your index daily. (and record when it happened)

then add a lastupdate date column & index to your table being searched.

It should be possible to scan your ctx index for the majority of the old unchanged data
and select from the small percentage of updated data using the traditonal LIKE
e.g:

WHERE (lastupdated<lastrefresh AND contains(name,'%ABC%')) 
OR (lastupdated>lastrefresh AND name like '%ABC%')

NOTE: you may find your query plan goes a little mental (lots of bitmap conversions to row ids) in that case split the 2 parts of the OR into a UNION ALL query.
e.g

SELECT id FROM mytable   
WHERE
(lastupdate>lastrefresh and name LIKE '%ABC%')
UNION ALL
SELECT id FROM mytable
WHERE lastupdate<lastrefresh and CONTAINS(name, '%ABC%', 1) > 0

Any performance impact in Oracle for using LIKE 'string' vs = 'string'?

There is a clear difference when you use bind variables, which you should be using in Oracle for anything other than data warehousing or other bulk data operations.

Take the case of:

SELECT * FROM SOME_TABLE WHERE SOME_FIELD LIKE :b1

Oracle cannot know that the value of :b1 is '%some_value%', or 'some_value' etc. until execution time, so it will make an estimation of the cardinality of the result based on heuristics and come up with an appropriate plan that either may or may not be suitable for various values of :b, such as '%A','%', 'A' etc.

Similar issues can apply with an equality predicate but the range of cardinalities that might result is much more easily estimated based on column statistics or the presence of a unique constraint, for example.

So, personally I wouldn't start using LIKE as a replacement for =. The optimizer is pretty easy to fool sometimes.

PL/SQL Performance Tuning for LIKE '%...%' Wildcard Queries

As already mentioned you could add a ctx context index to the name columns.

assuming a small number of records get updated, 1 option is to refresh your index daily. (and record when it happened)

then add a lastupdate date column & index to your table being searched.

It should be possible to scan your ctx index for the majority of the old unchanged data
and select from the small percentage of updated data using the traditonal LIKE
e.g:

WHERE (lastupdated<lastrefresh AND contains(name,'%ABC%')) 
OR (lastupdated>lastrefresh AND name like '%ABC%')

NOTE: you may find your query plan goes a little mental (lots of bitmap conversions to row ids) in that case split the 2 parts of the OR into a UNION ALL query.
e.g

SELECT id FROM mytable   
WHERE
(lastupdate>lastrefresh and name LIKE '%ABC%')
UNION ALL
SELECT id FROM mytable
WHERE lastupdate<lastrefresh and CONTAINS(name, '%ABC%', 1) > 0

SQL 'like' vs '=' performance

See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx

Quote from there:

the rules for index usage with LIKE
are loosely like this:

  • If your filter criteria uses equals =
    and the field is indexed, then most
    likely it will use an INDEX/CLUSTERED
    INDEX SEEK

  • If your filter criteria uses LIKE,
    with no wildcards (like if you had a
    parameter in a web report that COULD
    have a % but you instead use the full
    string), it is about as likely as #1
    to use the index. The increased cost
    is almost nothing.

  • If your filter criteria uses LIKE, but
    with a wildcard at the beginning (as
    in Name0 LIKE '%UTER') it's much less
    likely to use the index, but it still
    may at least perform an INDEX SCAN on
    a full or partial range of the index.

  • HOWEVER, if your filter criteria uses
    LIKE, but starts with a STRING FIRST
    and has wildcards somewhere AFTER that
    (as in Name0 LIKE 'COMP%ER'), then SQL
    may just use an INDEX SEEK to quickly
    find rows that have the same first
    starting characters, and then look
    through those rows for an exact match.


(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)

Is substr or LIKE faster in Oracle?

Assuming maximum performance is the goal, I would ideally choose SUBSTR(my_field,1,6) and create a function-based index to support the query.

CREATE INDEX my_substr_idx
ON my_table( substr( my_field,1,6 ) );

As others point out, SUBSTR(my_field,1,6) would not be able to use a regular index on MY_FIELD. The LIKE version might use the index, but the optimizer's cardinality estimates in that case are generally rather poor so it is quite likely to either not use an index when it would be helpful or to use an index when a table scan would be preferable. Indexing the actual expression will give the optimizer far more information to work with so it is much more likely to pick the index correctly. Someone smarter than I am may be able to suggest a way to use statistics on virtual columns in 11g to give the optimizer better information for the LIKE query.

If 6 is a variable (i.e. you sometimes want to search the first 6 characters and sometimes want to search a different number), you probably won't be able to come up with a function-based index to support that query. In that case, you're probably better off with the vagaries of the optimizer's decisions with the LIKE formulation.

PostgreSQL LIKE query performance variations

FTS does not support LIKE

The previously accepted answer was incorrect. Full Text Search with its full text indexes is not for the LIKE operator at all, it has its own operators and doesn't work for arbitrary strings. It operates on words based on dictionaries and stemming. It does support prefix matching for words, but not with the LIKE operator:

  • Get partial match from GIN indexed TSVECTOR column

Trigram index for LIKE

Install the additional module pg_trgm which provides operator classes for GIN and GiST trigram indexes to support all LIKE and ILIKE patterns, not just left-anchored ones:

Example index:

CREATE INDEX tbl_col_gin_trgm_idx  ON tbl USING gin  (col gin_trgm_ops);

Or:

CREATE INDEX tbl_col_gist_trgm_idx ON tbl USING gist (col gist_trgm_ops);
  • Difference between GiST and GIN index

Example query:

SELECT * FROM tbl WHERE col LIKE 'foo%';
SELECT * FROM tbl WHERE col LIKE '%foo%'; -- works with leading wildcard, too
SELECT * FROM tbl WHERE col ILIKE '%foo%'; -- works case insensitively as well

Trigrams? What about shorter strings?

Words with less than 3 letters in indexed values still work. The manual:

Each word is considered to have two spaces prefixed and one space
suffixed when determining the set of trigrams contained in the string.

And search patterns with less than 3 letters? The manual:

For both LIKE and regular-expression searches, keep in mind that a
pattern with no extractable trigrams will degenerate to a full-index scan.

Meaning, that index / bitmap index scans still work (query plans for prepared statement won't break), it just won't buy you better performance. Typically no big loss, since 1- or 2-letter strings are hardly selective (more than a few percent of the underlying table matches) and index support would not improve performance (much) to begin with, because a full table scan is faster.

Prefix matching

Search patterns with no leading wildcard: col LIKE 'foo%'.

^@ operator / starts_with() function

Quoting the release notes of Postgres 11:

Add prefix-match operator text ^@ text, which is supported by SP-GiST
(Ildus Kurbangaliev)

This is similar to using var LIKE 'word%' with a btree index, but it
is more efficient.

Example query:

SELECT * FROM tbl WHERE col ^@ 'foo';  -- no added wildcard

But the potential of operator and function stays limited until planner support is improved in Postgres 15 and the ^@ operator is documented properly. The release notes:

Allow the ^@ starts-with operator and the starts_with() function to
use btree indexes if using the C collation (Tom Lane)

Previously these could only use SP-GiST indexes.

COLLATE "C"

Since Postgres 9.1, an index with COLLATE "C" provides the same functionality as the operator class text_pattern_ops described below. See:

  • Is there a difference between text_pattern_ops and COLLATE "C"?

text_pattern_ops (original answer)

For just left-anchored patterns (no leading wildcard) you get the optimum with a suitable operator class for a btree index: text_pattern_ops or varchar_pattern_ops. Both built-in features of standard Postgres, no additional module needed. Similar performance, but much smaller index.

Example index:

CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);

Example query:

SELECT * FROM tbl WHERE col LIKE 'foo%';  -- no leading wildcard

Or, if you should be running your database with the 'C' locale (effectively no locale), then everything is sorted according to byte order anyway and a plain btree index with default operator class does the job.



Further reading

  • Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
  • How is LIKE implemented?
  • Finding similar strings with PostgreSQL quickly

SQL Server: how to optimize like queries?

To do much for a LIKE where the pattern has the form '%XXX%', you want to look up SQL Server's full-text indexing capability, and use CONTAINS instead of LIKE. As-is, you're doing a full table scan, because a normal index won't help with a search for an item that starts with a wild card -- but a full-text index will.

/* ... */
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND (contains([t0].[FIRSTNAME], 'John'))
AND (contains([t0].[LASTNAME], 'Smith'))
AND (contains([t0].[SSN], '123'))
AND (contains([t0].[CLIENTNUMBER],'123'))
AND (contains([t0].[MDOCNUMBER], '123'))
AND ([t0].[CLIENTINDICATOR] = 'ON')


Related Topics



Leave a reply



Submit