What's the Difference Between "Like" and "=" in SQL

What's the difference between LIKE and = in SQL?

As per SQL standard, the difference is treatment of trailing whitespace in CHAR columns. Example:

create table t1 ( c10 char(10) );
insert into t1 values ('davyjones');

select * from t1 where c10 = 'davyjones';
-- yields 1 row

select * from t1 where c10 like 'davyjones';
-- yields 0 rows

Of course, assuming you run this on a standard-compliant DBMS. BTW, this is one the main differences between CHARs and VARCHARs.

Equals(=) vs. LIKE

Different Operators

LIKE and = are different operators. Most answers here focus on the wildcard support, which is not the only difference between these operators!

= is a comparison operator that operates on numbers and strings. When comparing strings, the comparison operator compares whole strings.

LIKE is a string operator that compares character by character.

To complicate matters, both operators use a collation which can have important effects on the result of the comparison.

Motivating Example

Let us first identify an example where these operators produce obviously different results. Allow me to quote from the MySQL manual:

Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:

mysql> SELECT 'ä' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| 'ä' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
| 0 |
+-----------------------------------------+
mysql> SELECT 'ä' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| 'ä' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
| 1 |
+--------------------------------------+

Please note that this page of the MySQL manual is called String Comparison Functions, and = is not discussed, which implies that = is not strictly a string comparison function.

How Does = Work?

The SQL Standard § 8.2 describes how = compares strings:

The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length
in characters of Y, then the shorter string is effectively
replaced, for the purposes of comparison, with a copy of
itself that has been extended to the length of the longer
string by concatenation on the right of one or more pad
characters, where the pad character is chosen based on CS. If
CS has the NO PAD attribute, then the pad character is an
implementation-dependent character different from any
character in the character set of X and Y that collates less
than any string under CS. Otherwise, the pad character is a
<space>.

b) The result of the comparison of X and Y is given by the
collating sequence CS.

c) Depending on the collating sequence, two strings may
compare as equal even if they are of different lengths or
contain different sequences of characters. When the operations
MAX, MIN, DISTINCT, references to a grouping column, and the
UNION, EXCEPT, and INTERSECT operators refer to character
strings, the specific value selected by these operations from
a set of such equal values is implementation-dependent.

(Emphasis added.)

What does this mean? It means that when comparing strings, the = operator is just a thin wrapper around the current collation. A collation is a library that has various rules for comparing strings. Here is an example of a binary collation from MySQL:

static int my_strnncoll_binary(const CHARSET_INFO *cs __attribute__((unused)),
const uchar *s, size_t slen,
const uchar *t, size_t tlen,
my_bool t_is_prefix)
{
size_t len= MY_MIN(slen,tlen);
int cmp= memcmp(s,t,len);
return cmp ? cmp : (int)((t_is_prefix ? len : slen) - tlen);
}

This particular collation happens to compare byte-by-byte (which is why it's called "binary" — it doesn't give any special meaning to strings). Other collations may provide more advanced comparisons.

For example, here is a UTF-8 collation that supports case-insensitive comparisons. The code is too long to paste here, but go to that link and read the body of my_strnncollsp_utf8mb4(). This collation can process multiple bytes at a time and it can apply various transforms (such as case insensitive comparison). The = operator is completely abstracted from the vagaries of the collation.

How Does LIKE Work?

The SQL Standard § 8.5 describes how LIKE compares strings:

The <predicate>

M LIKE P

is true if there exists a partitioning of M into substrings
such that:

i) A substring of M is a sequence of 0 or more contiguous
<character representation>s of M and each <character
representation> of M is part of exactly one substring.

ii) If the i-th substring specifier of P is an arbitrary
character specifier, the i-th substring of M is any single
<character representation>.

iii) If the i-th substring specifier of P is an arbitrary string
specifier, then the i-th substring of M is any sequence of
0 or more <character representation>s.

iv) If the i-th substring specifier of P is neither an
arbitrary character specifier nor an arbitrary string specifier,
then the i-th substring of M is equal to that substring
specifier according to the collating sequence of
the <like predicate>, without the appending of <space>
characters to M, and has the same length as that substring
specifier.

v) The number of substrings of M is equal to the number of
substring specifiers of P.

(Emphasis added.)

This is pretty wordy, so let's break it down. Items ii and iii refer to the wildcards _ and %, respectively. If P does not contain any wildcards, then only item iv applies. This is the case of interest posed by the OP.

In this case, it compares each "substring" (individual characters) in M against each substring in P using the current collation.

Conclusions

The bottom line is that when comparing strings, = compares the entire string while LIKE compares one character at a time. Both comparisons use the current collation. This difference leads to different results in some cases, as evidenced in the first example in this post.

Which one should you use? Nobody can tell you that — you need to use the one that's correct for your use case. Don't prematurely optimize by switching comparison operators.

SQL 'like' vs '=' performance

See https://web.archive.org/web/20150209022016/http://myitforum.com/cs2/blogs/jnelson/archive/2007/11/16/108354.aspx

Quote from there:

the rules for index usage with LIKE
are loosely like this:

  • If your filter criteria uses equals =
    and the field is indexed, then most
    likely it will use an INDEX/CLUSTERED
    INDEX SEEK

  • If your filter criteria uses LIKE,
    with no wildcards (like if you had a
    parameter in a web report that COULD
    have a % but you instead use the full
    string), it is about as likely as #1
    to use the index. The increased cost
    is almost nothing.

  • If your filter criteria uses LIKE, but
    with a wildcard at the beginning (as
    in Name0 LIKE '%UTER') it's much less
    likely to use the index, but it still
    may at least perform an INDEX SCAN on
    a full or partial range of the index.

  • HOWEVER, if your filter criteria uses
    LIKE, but starts with a STRING FIRST
    and has wildcards somewhere AFTER that
    (as in Name0 LIKE 'COMP%ER'), then SQL
    may just use an INDEX SEEK to quickly
    find rows that have the same first
    starting characters, and then look
    through those rows for an exact match.


(Also keep in mind, the SQL engine
still might not use an index the way
you're expecting, depending on what
else is going on in your query and
what tables you're joining to. The
SQL engine reserves the right to
rewrite your query a little to get the
data in a way that it thinks is most
efficient and that may include an
INDEX SCAN instead of an INDEX SEEK)

Is there a combination of LIKE and IN in SQL?

There is no combination of LIKE & IN in SQL, much less in TSQL (SQL Server) or PLSQL (Oracle). Part of the reason for that is because Full Text Search (FTS) is the recommended alternative.

Both Oracle and SQL Server FTS implementations support the CONTAINS keyword, but the syntax is still slightly different:

Oracle:

WHERE CONTAINS(t.something, 'bla OR foo OR batz', 1) > 0

SQL Server:

WHERE CONTAINS(t.something, '"bla*" OR "foo*" OR "batz*"')

The column you are querying must be full-text indexed.

Reference:

  • Building Full-Text Search Applications with Oracle Text
  • Understanding SQL Server Full-Text

Difference between LIKE and ~ in Postgres

~ is the regular expression operator, and has the capabilities implied by that. You can specify a full range of regular expression wildcards and quantifiers; see the documentation for details. It is certainly more powerful than LIKE, and should be used when that power is needed, but they serve different purposes.

SQL efficiency - [=] vs [in] vs [like] vs [matches]

I will add to that also exists and subquery.

But the performance depends on the optimizer of the given SQL engine.

In oracle you have a lot of differences between IN and EXISTS, but not necessarily in SQL Server.

The other thing that you have to consider is the selectivity of the column that you use. Some cases show that IN is better.


But you have to remember that IN is non-sargable (non search argument able) so it will not use the index to resolve the query, the LIKE and = are sargable and support the index


The best ? You should spend some time to test it in your environment

Use '=' or LIKE to compare strings in SQL?

To see the performance difference, try this:

SELECT count(*)
FROM master..sysobjects as A
JOIN tempdb..sysobjects as B
on A.name = B.name

SELECT count(*)
FROM master..sysobjects as A
JOIN tempdb..sysobjects as B
on A.name LIKE B.name

Comparing strings with '=' is much faster.

sql: what is the difference between LIKE %...% and LIKE?

  1. In your First query it shows all config column data which contains code = LIST

  2. In your second query it shows all config column data which exact to code = LIST

LIKE supports wildcards. Usually it uses the % or _ character for the wildcard.

For know more about LIKE

What is the difference between the IN operator and = operator in SQL?

IN

will not generate an error if you have multiple results on the subquery. Allows to have more than one value in the result returned by the subquery.

=

will generate an error if you have more than one result on the subquery.

  • SQLFiddle Demo (IN vs =)

What is the difference between LIKE '[A-D]%' ; and BETWEEN 'A' AND 'D' ; in SQL Server

Let me assume you are talking about SQL Server, not MySQL.

SQL Server has limited extensions of LIKE that support character ranges. So, you are comparing:

where name like '[A-D]%' 
where name between 'A' and 'D'

The difference is simple. A name longer than "D" that starts with "D" matches the first condition but not the second. For instance, 'Dionysis' and 'Dracula' match the like but not the between.

The equivalent comparison would be:

where name >= 'A' and name < 'E'


Related Topics



Leave a reply



Submit