SQL String Comparison Speed 'Like' VS 'Patindex'

How about the performance between PATINDEX and CHARINDEX?

CHARINDEX has limit of 8000 for the string you are looking for. Not for the string you are searching.

That article is probably wrong unless you want to look for strings > 8000/4000 length. However, it also uses text datatype which is deprecated. It's dated 2007, so it would be SQL Server 2005 or 2000 he used but SQL Server 2005 BOL does not mention 8000 at all and it's not something I've tried.

From the SQL Server 2008 BOL:

CHARINDEX ( expression1 ,expression2 [
, start_location ] )

expression1 Is a character expression that
contains the sequence to be found.
expression1 is limited to 8000
characters.

expression2 Is a character expression to be searched.

Note: PATINDEX does not mention 8000 limit for SQL 2008 or 2005.

Finally, I'd use CHARINDEX because I think it's more intuitive for straightforward searches if you don't need pattern matching and it supports long strings

Performance optimized way to test whether a varchar field has any digit (SQL Server 2012)

You can query using a ranged pattern expression:

To Query for records with a digit:

where fieldName LIKE '%[0-9]%'

To Query for records without a digit:

where fieldName NOT LIKE '%[0-9]%'

See here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/like-transact-sql?view=sql-server-2017#arguments for more information.

T-SQL speed comparison between LEFT() vs. LIKE operator

Your best bet would be to measure the performance on real production data rather than trying to guess (or ask us). That's because performance can sometimes depend on the data you're processing, although in this case it seems unlikely (but I don't know that, hence why you should check).

If this is a query you will be doing a lot, you should consider another (indexed) column which contains the lowercased first letter of name and have it set by an insert/update trigger.

This will, at the cost of a minimal storage increase, make this query blindingly fast:

select * from table where name_first_char_lower = @firstletter

That's because most database are read far more often than written, and this will amortise the cost of the calculation (done only for writes) across all reads.

It introduces redundant data but it's okay to do that for performance as long as you understand (and mitigate, as in this suggestion) the consequences and need the extra performance.

CHARINDEX vs LIKE search gives very different performance, why?

I will answer my own question since it was hard to find correct answer and I was pointed to the problem by SQL Server 2012 Execution Plan output. As you see in original question - everything looks OK on surface. This is SQL Server 2008.

When I run same query on 2012 I got warning on CHARINDEX query. Problem is - SQL Server had to do type conversion. Address1 is VarChar and query has N'1124' which is Unicode or NVarChar. If I change this query as so:

SELECT * 
FROM LOCAddress
WHERE (CAST(CHARINDEX(LOWER('1124'), LOWER([Address1])) AS int))

It then runs same as LIKE query. So, type conversion that was caused by Entity Framework generator was causing this horrible hit in performance.

Linq: SqlFunctions.PatIndex vs string.Contains for string comparisson

I believe version 1 runs faster theoretically.

Reasons:

  1. Both versions translate into SQL queries eventually.
  2. Version 1 translates into a query with where clause having 'like' operator
  3. Version 2 translate into a query with where clause having 'PatIndex' function.
  4. We know function in SQL will take longer time to return results when comparing with pure 'like' operator in large data set.

Using PatIndex and CharIndex like the InStr function

You've got it reversed.

select patindex('%.com%', 'http://www.stackoverflow.com');
select charindex('.com', 'http://www.stackoverflow.com');

Simple Explanation for PATINDEX

Are you asking for realistic use-cases? I can think of two, real-life use-cases that I've had at work where PATINDEX() was my best option.

I had to import a text-file and parse it for INSERT INTO later on. But these files sometimes had numbers in this format: 00000-59. If you try CAST('00000-59' AS INT) you'll get an error. So I needed code that would parse 00000-59 to -59 but also 00000159 to 159 etc. The - could be anywhere, or it could simply not be there at all. This is what I did:

DECLARE @my_var VARCHAR(255) = '00000-59', @my_int INT

SET @my_var = STUFF(@my_var, 1, PATINDEX('%[^0]%', @my_var)-1, '')
SET @my_int = CAST(@my_var AS INT)

[^0] in this case means "any character that isn't a 0". So PATINDEX() tells me when the 0's end, regardless of whether that's because of a - or a number.

The second use-case I've had was checking whether an IBAN number was correct. In order to do that, any letters in the IBAN need to be changed to a corresponding number (A=10, B=11, etc...). I did something like this (incomplete but you get the idea):

SET @i = PATINDEX('%[^0-9]%', @IBAN)
WHILE @i <> 0 BEGIN
SET @num = UNICODE(SUBSTRING(@IBAN, @i, 1))-55
SET @IBAN = STUFF(@IBAN, @i, 1, CAST(@num AS VARCHAR(2))
SET @i = PATINDEX('%[^0-9]%', @IBAN)
END

So again, I'm not concerned with finding exactly the letter A or B etc. I'm just finding anything that isn't a number and converting it.

Performance of SUBSTRING vs LEFT in SQL Server

SQL Server is a database. You dod not ask questions of which string processing function is 'faster'. You ask the questions 'which can use an index?' and 'do I have the required index?'. Is all about data access, because disks are sloooooow, not about shifting CPU registers.

So, Which can use an index? (which one is sargable?). In theory LEFT could use an index, but in practice it usually does not. SUBSTRING cannot. Instead of SUBSTRING use Full Text.

Design your data model to take advantage of sargable expressions, index accordingly. That's all there is to it, there is no magic bullet. Avoid scans.



Related Topics



Leave a reply



Submit