Search Criteria Difference Between Like VS Contains() in Oracle

How can i use `contains` in oracle for fast full text searching for text but i want it to work as `like` %texthere%

You might want to reference this question here:

search criteria difference between Like vs Contains() in oracle

Your syntax for using CONTAINS() works properly in the first example

Your second question, what is the difference between:

CONTAINS(fistname,'hh',1) > 0 and CONTAINS(fistname,'%hh%',1) > 0

The difference is the first CONTAINS(fistname,'hh',1) > 0 is searching for an independent "hh" word

In the second CONTAINS(fistname,'%hh%',1) > 0, it is searching for any instance of the string "hh" regardless of what is before or after it.

what is the significance of 'CONTAIN' function in oracle.?

Contains is the Oracle Text operator. Documentation (https://docs.oracle.com/database/121/CCAPP/GUID-13F9B749-125B-40FD-9AFD-A636597447D0.htm#CCAPP9136) says:

When you create an index of type CONTEXT, you must use the CONTAINS operator to enter your query. An index of type CONTEXT is suited for indexing collections of large coherent documents.

With the CONTAINS operator, you can use a number of operators to define your search criteria. These operators enable you to enter logical, proximity, fuzzy, stemming, thesaurus and wildcard searches. With a correctly configured index, you can also enter section searches on documents that have internal structure such as HTML and XML.

With CONTAINS, you can also use the ABOUT operator to search on document themes.

Furthermore:

The CONTAINS operator must always be followed by the > 0 syntax, which specifies that the score value returned by the CONTAINS operator must be greater than zero for the row to be returned.

When the SCORE operator is called in the SELECT statement, the CONTAINS operator must reference the score label value in the third parameter


I think you're confused with numbers written in both SCORE and CONTAINS - they should match. Think of them as if they were "labels".

A more "complex" example might be this:

SQL> SELECT SCORE (1) sc1,
2 SCORE (2) sc2
3 FROM accumtbl
4 WHERE CONTAINS (text, 'dog accum Cat', 1) > 0 --> this "1" comes from "score (1)"
5 OR CONTAINS (text, 'little dog' , 2) > 0 --> this "2" comes from "score (2)"
6 ;

SC1 SC2
---------- ----------
6 4
52 0

SQL>

You said:

I have tried with different values ... (but always got the same result)

Of course you did; all of these would return the same result:

select score(1)    ... contains (text, 'something', 1)
select score(100) ... contains (text, 'something', 100)
select score(57) ... contains (text, 'something', 57)
select score(-261) ... contains (text, 'something', -261)

There's a lot to read about Oracle Text. Here's the Table of contents (https://docs.oracle.com/database/121/CCAPP/toc.htm); happy reading!

What's the difference between LIKE and = in SQL?

As per SQL standard, the difference is treatment of trailing whitespace in CHAR columns. Example:

create table t1 ( c10 char(10) );
insert into t1 values ('davyjones');

select * from t1 where c10 = 'davyjones';
-- yields 1 row

select * from t1 where c10 like 'davyjones';
-- yields 0 rows

Of course, assuming you run this on a standard-compliant DBMS. BTW, this is one the main differences between CHARs and VARCHARs.

What is Full Text Search vs LIKE

In general, there is a tradeoff between "precision" and "recall". High precision means that fewer irrelevant results are presented (no false positives), while high recall means that fewer relevant results are missing (no false negatives). Using the LIKE operator gives you 100% precision with no concessions for recall. A full text search facility gives you a lot of flexibility to tune down the precision for better recall.

Most full text search implementations use an "inverted index". This is an index where the keys are individual terms, and the associated values are sets of records that contain the term. Full text search is optimized to compute the intersection, union, etc. of these record sets, and usually provides a ranking algorithm to quantify how strongly a given record matches search keywords.

The SQL LIKE operator can be extremely inefficient. If you apply it to an un-indexed column, a full scan will be used to find matches (just like any query on an un-indexed field). If the column is indexed, matching can be performed against index keys, but with far less efficiency than most index lookups. In the worst case, the LIKE pattern will have leading wildcards that require every index key to be examined. In contrast, many information retrieval systems can enable support for leading wildcards by pre-compiling suffix trees in selected fields.

Other features typical of full-text search are

  • lexical analysis or tokenization—breaking a
    block of unstructured text into
    individual words, phrases, and
    special tokens
  • morphological
    analysis, or stemming—collapsing variations
    of a given word into one index term;
    for example, treating "mice" and
    "mouse", or "electrification" and
    "electric" as the same word
  • ranking—measuring the
    similarity of a matching record to
    the query string

Equals(=) vs. LIKE

Different Operators

LIKE and = are different operators. Most answers here focus on the wildcard support, which is not the only difference between these operators!

= is a comparison operator that operates on numbers and strings. When comparing strings, the comparison operator compares whole strings.

LIKE is a string operator that compares character by character.

To complicate matters, both operators use a collation which can have important effects on the result of the comparison.

Motivating Example

Let us first identify an example where these operators produce obviously different results. Allow me to quote from the MySQL manual:

Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:

mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+

Please note that this page of the MySQL manual is called String Comparison Functions, and = is not discussed, which implies that = is not strictly a string comparison function.

How Does = Work?

The SQL Standard § 8.2 describes how = compares strings:

The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length
in characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad
characters, where the pad character is chosen based on CS. If
CS has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any
character in the character set of X and Y that collates less
than any string under CS. Otherwise, the pad character is a
<space>.

b) The result of the comparison of X and Y is given by the
collating sequence CS.

c) Depending on the collating sequence, two strings may
compare as equal even if they are of different lengths or
contain different sequences of characters. When the operations
MAX, MIN, DISTINCT, references to a grouping column, and the
UNION, EXCEPT, and INTERSECT operators refer to character
strings, the specific value selected by these operations from
a set of such equal values is implementation-dependent.

(Emphasis added.)

What does this mean? It means that when comparing strings, the = operator is just a thin wrapper around the current collation. A collation is a library that has various rules for comparing strings. Here is an example of a binary collation from MySQL:

static int my_strnncoll_binary(const CHARSET_INFO *cs __attribute__((unused)),
const uchar *s, size_t slen,
const uchar *t, size_t tlen,
my_bool t_is_prefix)
{
size_t len= MY_MIN(slen,tlen);
int cmp= memcmp(s,t,len);
return cmp ? cmp : (int)((t_is_prefix ? len : slen) - tlen);
}

This particular collation happens to compare byte-by-byte (which is why it's called "binary" — it doesn't give any special meaning to strings). Other collations may provide more advanced comparisons.

For example, here is a UTF-8 collation that supports case-insensitive comparisons. The code is too long to paste here, but go to that link and read the body of my_strnncollsp_utf8mb4(). This collation can process multiple bytes at a time and it can apply various transforms (such as case insensitive comparison). The = operator is completely abstracted from the vagaries of the collation.

How Does LIKE Work?

The SQL Standard § 8.5 describes how LIKE compares strings:

The <predicate>

M LIKE P

is true if there exists a partitioning of M into substrings
such that:

i) A substring of M is a sequence of 0 or more contiguous
<character representation>s of M and each <character
representation> of M is part of exactly one substring.

ii) If the i-th substring specifier of P is an arbitrary
character specifier, the i-th substring of M is any single
<character representation>.

iii) If the i-th substring specifier of P is an arbitrary string
specifier, then the i-th substring of M is any sequence of
0 or more <character representation>s.

iv) If the i-th substring specifier of P is neither an
arbitrary character specifier nor an arbitrary string specifier,
then the i-th substring of M is equal to that substring
specifier according to the collating sequence of
the <like predicate>, without the appending of <space>
characters to M, and has the same length as that substring
specifier.

v) The number of substrings of M is equal to the number of
substring specifiers of P.

(Emphasis added.)

This is pretty wordy, so let's break it down. Items ii and iii refer to the wildcards _ and %, respectively. If P does not contain any wildcards, then only item iv applies. This is the case of interest posed by the OP.

In this case, it compares each "substring" (individual characters) in M against each substring in P using the current collation.

Conclusions

The bottom line is that when comparing strings, = compares the entire string while LIKE compares one character at a time. Both comparisons use the current collation. This difference leads to different results in some cases, as evidenced in the first example in this post.

Which one should you use? Nobody can tell you that — you need to use the one that's correct for your use case. Don't prematurely optimize by switching comparison operators.



Related Topics



Leave a reply



Submit