PL/SQL Performance Tuning for LIKE '%...%' Wildcard Queries
As already mentioned you could add a ctx context index to the name columns.
assuming a small number of records get updated, 1 option is to refresh your index daily. (and record when it happened)
then add a lastupdate date column & index to your table being searched.
It should be possible to scan your ctx index for the majority of the old unchanged data
and select from the small percentage of updated data using the traditonal LIKE
e.g:
WHERE (lastupdated<lastrefresh AND contains(name,'%ABC%'))
OR (lastupdated>lastrefresh AND name like '%ABC%')
NOTE: you may find your query plan goes a little mental (lots of bitmap conversions to row ids) in that case split the 2 parts of the OR into a UNION ALL query.
e.g
SELECT id FROM mytable
WHERE
(lastupdate>lastrefresh and name LIKE '%ABC%')
UNION ALL
SELECT id FROM mytable
WHERE lastupdate<lastrefresh and CONTAINS(name, '%ABC%', 1) > 0
Any performance impact in Oracle for using LIKE 'string' vs = 'string'?
There is a clear difference when you use bind variables, which you should be using in Oracle for anything other than data warehousing or other bulk data operations.
Take the case of:
SELECT * FROM SOME_TABLE WHERE SOME_FIELD LIKE :b1
Oracle cannot know that the value of :b1 is '%some_value%', or 'some_value' etc. until execution time, so it will make an estimation of the cardinality of the result based on heuristics and come up with an appropriate plan that either may or may not be suitable for various values of :b, such as '%A','%', 'A' etc.
Similar issues can apply with an equality predicate but the range of cardinalities that might result is much more easily estimated based on column statistics or the presence of a unique constraint, for example.
So, personally I wouldn't start using LIKE as a replacement for =. The optimizer is pretty easy to fool sometimes.
PL/SQL Performance Tuning for LIKE '%...%' Wildcard Queries
As already mentioned you could add a ctx context index to the name columns.
assuming a small number of records get updated, 1 option is to refresh your index daily. (and record when it happened)
then add a lastupdate date column & index to your table being searched.
It should be possible to scan your ctx index for the majority of the old unchanged data
and select from the small percentage of updated data using the traditonal LIKE
e.g:
WHERE (lastupdated<lastrefresh AND contains(name,'%ABC%'))
OR (lastupdated>lastrefresh AND name like '%ABC%')
NOTE: you may find your query plan goes a little mental (lots of bitmap conversions to row ids) in that case split the 2 parts of the OR into a UNION ALL query.
e.g
SELECT id FROM mytable
WHERE
(lastupdate>lastrefresh and name LIKE '%ABC%')
UNION ALL
SELECT id FROM mytable
WHERE lastupdate<lastrefresh and CONTAINS(name, '%ABC%', 1) > 0
SQL 'like' vs '=' performance
See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx
Quote from there:
the rules for index usage with LIKE
are loosely like this:
If your filter criteria uses equals =
and the field is indexed, then most
likely it will use an INDEX/CLUSTERED
INDEX SEEKIf your filter criteria uses LIKE,
with no wildcards (like if you had a
parameter in a web report that COULD
have a % but you instead use the full
string), it is about as likely as #1
to use the index. The increased cost
is almost nothing.If your filter criteria uses LIKE, but
with a wildcard at the beginning (as
in Name0 LIKE '%UTER') it's much less
likely to use the index, but it still
may at least perform an INDEX SCAN on
a full or partial range of the index.HOWEVER, if your filter criteria uses
LIKE, but starts with a STRING FIRST
and has wildcards somewhere AFTER that
(as in Name0 LIKE 'COMP%ER'), then SQL
may just use an INDEX SEEK to quickly
find rows that have the same first
starting characters, and then look
through those rows for an exact match.
(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)
Is substr or LIKE faster in Oracle?
Assuming maximum performance is the goal, I would ideally choose SUBSTR(my_field,1,6)
and create a function-based index to support the query.
CREATE INDEX my_substr_idx
ON my_table( substr( my_field,1,6 ) );
As others point out, SUBSTR(my_field,1,6)
would not be able to use a regular index on MY_FIELD
. The LIKE version might use the index, but the optimizer's cardinality estimates in that case are generally rather poor so it is quite likely to either not use an index when it would be helpful or to use an index when a table scan would be preferable. Indexing the actual expression will give the optimizer far more information to work with so it is much more likely to pick the index correctly. Someone smarter than I am may be able to suggest a way to use statistics on virtual columns in 11g to give the optimizer better information for the LIKE query.
If 6 is a variable (i.e. you sometimes want to search the first 6 characters and sometimes want to search a different number), you probably won't be able to come up with a function-based index to support that query. In that case, you're probably better off with the vagaries of the optimizer's decisions with the LIKE formulation.
PostgreSQL LIKE query performance variations
FTS does not support LIKE
The previously accepted answer was incorrect. Full Text Search with its full text indexes is not for the LIKE
operator at all, it has its own operators and doesn't work for arbitrary strings. It operates on words based on dictionaries and stemming. It does support prefix matching for words, but not with the LIKE
operator:
- Get partial match from GIN indexed TSVECTOR column
Trigram index for LIKE
Install the additional module pg_trgm
which provides operator classes for GIN and GiST trigram indexes to support all LIKE
and ILIKE
patterns, not just left-anchored ones:
Example index:
CREATE INDEX tbl_col_gin_trgm_idx ON tbl USING gin (col gin_trgm_ops);
Or:
CREATE INDEX tbl_col_gist_trgm_idx ON tbl USING gist (col gist_trgm_ops);
- Difference between GiST and GIN index
Example query:
SELECT * FROM tbl WHERE col LIKE 'foo%';
SELECT * FROM tbl WHERE col LIKE '%foo%'; -- works with leading wildcard, too
SELECT * FROM tbl WHERE col ILIKE '%foo%'; -- works case insensitively as well
Trigrams? What about shorter strings?
Words with less than 3 letters in indexed values still work. The manual:
Each word is considered to have two spaces prefixed and one space
suffixed when determining the set of trigrams contained in the string.
And search patterns with less than 3 letters? The manual:
For both
LIKE
and regular-expression searches, keep in mind that a
pattern with no extractable trigrams will degenerate to a full-index scan.
Meaning, that index / bitmap index scans still work (query plans for prepared statement won't break), it just won't buy you better performance. Typically no big loss, since 1- or 2-letter strings are hardly selective (more than a few percent of the underlying table matches) and index support would not improve performance (much) to begin with, because a full table scan is faster.
Prefix matching
Search patterns with no leading wildcard: col LIKE 'foo%'
.
^@
operator / starts_with()
function
Quoting the release notes of Postgres 11:
Add prefix-match operator text ^@ text, which is supported by SP-GiST
(Ildus Kurbangaliev)This is similar to using var LIKE 'word%' with a btree index, but it
is more efficient.
Example query:
SELECT * FROM tbl WHERE col ^@ 'foo'; -- no added wildcard
But the potential of operator and function stays limited until planner support is improved in Postgres 15 and the ^@
operator is documented properly. The release notes:
Allow the
^@
starts-with operator and thestarts_with()
function to
use btree indexes if using the C collation (Tom Lane)Previously these could only use SP-GiST indexes.
COLLATE "C"
Since Postgres 9.1, an index with COLLATE "C"
provides the same functionality as the operator class text_pattern_ops
described below. See:
- Is there a difference between text_pattern_ops and COLLATE "C"?
text_pattern_ops
(original answer)
For just left-anchored patterns (no leading wildcard) you get the optimum with a suitable operator class for a btree index: text_pattern_ops
or varchar_pattern_ops
. Both built-in features of standard Postgres, no additional module needed. Similar performance, but much smaller index.
Example index:
CREATE INDEX tbl_col_text_pattern_ops_idx ON tbl(col text_pattern_ops);
Example query:
SELECT * FROM tbl WHERE col LIKE 'foo%'; -- no leading wildcard
Or, if you should be running your database with the 'C' locale (effectively no locale), then everything is sorted according to byte order anyway and a plain btree index with default operator class does the job.
Further reading
- Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
- How is LIKE implemented?
- Finding similar strings with PostgreSQL quickly
SQL Server: how to optimize like queries?
To do much for a LIKE
where the pattern has the form '%XXX%'
, you want to look up SQL Server's full-text indexing capability, and use CONTAINS
instead of LIKE
. As-is, you're doing a full table scan, because a normal index won't help with a search for an item that starts with a wild card -- but a full-text index will.
/* ... */
WHERE (LTRIM(RTRIM([t0].[DOCREVNO])) = '0')
AND (contains([t0].[FIRSTNAME], 'John'))
AND (contains([t0].[LASTNAME], 'Smith'))
AND (contains([t0].[SSN], '123'))
AND (contains([t0].[CLIENTNUMBER],'123'))
AND (contains([t0].[MDOCNUMBER], '123'))
AND ([t0].[CLIENTINDICATOR] = 'ON')
Related Topics
#1146 - Table 'Phpmyadmin.Pma_Tracking' Doesn't Exist
Counting Rows for All Tables at Once
Recursive Query Used for Transitive Closure
In SQL Is There a Difference Between Count(*) and Count(<Fieldname>)
Pros and Cons of Autoincrement Keys on "Every Table"
Are Determinants and Candidate Keys Same or Different Things
What Column Should the Clustered Index Be Put On
Remove Trailing Empty Space in a Field Content
Checking for Time Range Overlap, the Watchman Problem [Sql]
SQL 2005 Split Comma Separated Column on Delimiter
Execute Query on SQL Server Analysis Services with Ironpython
Can There Be Constraints with the Same Name in a Db
How to Edit Values of an Insert in a Trigger on SQL Server
How to Create an Alias of Database in SQL Server
What Is the Purpose of Putting an 'N' in Front of Function Parameters in Tsql
SQL Server Sub Query with a Comma Separated Resultset
Conditional Where Clause with Case Statement in Oracle
How to Get a List of All Current Temporary Tables in SQL Server