SQL Server Index - Any Improvement for Like Queries

SQL Server Index - Any improvement for LIKE queries?

Only if you add full-text searching to those columns, and use the full-text query capabilities of SQL Server.

Otherwise, no, an index will not help.

Does Adding Indexes speed up String Wildcard % searches?

Creating a normal index will not help(*), but a full-text index will, though you would have to change your query to something like this:

select * from dbo.Product where ProductName CONTAINS 'furniture'

(* -- well, it can be slightly helpful, in that it can reduce a scan over every row and column in your table into a scan over merely every row and only the relevant columns. However, it will not achieve the orders of magnitude performance boost that we normally expect from indexes that turn scans into single seeks.)

How to make LIKE '%Search% faster in SQL Server

You are right... queries with a leading wildcard are awful for performance. To get around this, Sql Server has something called full text search. You create a special FULL TEXT Index for each of the columns you want to search, and then update your code to use the CONTAINS keyword:

SELECT 
p.CrmId,
park.Name
from Property p
inner join Som som on som.CrmId = p.SystemOfMeasurementId
left join Park park on park.CrmId = p.ParkId
WHERE
(
Contains(p.City, @search)
or Contains(p.Address1, @search)
or Contains(p.Address2, @search)
or Contains(p.State, @search)
or Contains(park.Name, @search)
or Contains(p.ZipCode, @search)
)
AND (@usOnly = 0 or (p.CrmCountryId = @USA_COUNTRY_ID))

Unfortunately, all those OR conditions are still likely to make this pretty slow, and FULL TEXT wasn't intended as much for shorter strings like City or State, or for casting wide nets like this. You may find you'll do much better for this kind of search by integrating with a tool like Solr or ElasticSearch. In addition to writing a better and faster search, these tools will help you create sane rankings for returning results in an order that makes sense and is relevant to the input.

Another strategy it to create a computed column that concatenates your address and name text into a single column, and then do a single FULL TEXT index on that one field, with a single CONTAINS() call.

Improve performance of SQL Query with dynamic like

Take a look at my answer about using LIKE operator here

It could be quite performant if you use some tricks

You can gain much speed if you play with collation, try this:

SELECT DISTINCT TOP 10 p.[Id], n.[LastName], n.[FirstName]       
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
WHERE EXISTS (
SELECT 'x' x
FROM [dbo].[people_NAME] n2
WHERE n2.[Id] != p.[id]
AND
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
)

As you can see we are using binary comparision instead of string comparision and this is much more performant.

Pay attention, you are working with people's names, so you can have issues with special unicode characters or strange accents.. etc.. etc..

Normally the EXISTS clause is better than INNER JOIN but you are using also a DISTINCT that is a GROUP BY on all columns.. so why not to use this?

You can switch to INNER JOIN and use the GROUP BY instead of the DISTINCT so testing COUNT(*)>1 will be (very little) more performant than testing WHERE n2.[Id] != p.[id], especially if your TOP clause is extracting many rows.

Try this:

SELECT TOP 10 p.[Id], n.[LastName], n.[FirstName]
FROM [dbo].[people] p
INNER JOIN [dbo].[people_NAME] n on n.[Id] = p.[Id]
INNER JOIN [dbo].[people_NAME] n2 on
lower(n2.[FirstName]) collate latin1_general_bin
LIKE
'%' + lower(n1.[FirstName]) + '%' collate latin1_general_bin
GROUP BY n1.[Id], n1.[FirstName]
HAVING COUNT(*)>1

Here we are matching also the name itself, so we will find at least one match for each name.
But We need only names that matches other names, so we will keep only rows with match count greater than one (count(*)=1 means that name match only with itself).

EDIT: I did all test using a random names table with 100000 rows and found that in this scenario, normal usage of LIKE operator is about three times worse than binary comparision.

Recommended index and query improvement

First of all I'm going to say that you should follow some tips for optimizing query execution time besides implementing a correct index strategy.

  • Avoid functions in the inner SELECT and JOIN statements. Functions (even when cached) should be executed for the lowest amount possible of records and, usually, this happens in the outermost select.
  • Avoid subqueries when possible, chose JOIN instead.
  • Avoid using non numeric fields in the where statements when possible, an index scan on an INT field is much much faster than on a VARCHAR.
  • Avoid using the WITH(NOLOCK) hint since you will also read uncommitted data. It doesn't make the query go faster and you'll have a potential dirty dataset.

When trying to optimize a query also keep in mind the order of operation that the query "interpreter" uses to parse it:

  1. FROM and JOIN BLOCK
  2. GROUP BY AND HAVING
  3. WHERE
  4. SELECT

So try to write your query to reduce the number or records returned by each of this block in THIS order.

That being said, an INDEX must be created according to the query that uses and you can find an helpful hint if you test a query execution with the execution plan included, often SSMS helps you a lot.

In this case I'd add an index on the URL and TimeStamp fields, in that order

CREATE CLUSTERED INDEX idx_Log ON yourDatabase.dbo.[log] (URL, Timestamp)

Why this index doesn't improve query performance

Since you're fetching most of the rows in the tables, the indexes have to be covering (=contain every column you need in your query from that table) to help you at all -- and that improvement might not be much.

The reason the indexes don't really help is that you're reading most of the rows, and you have IrreleventFields in your query. Since the index contains only the index key + clustered key, the rest of the fields must be fetched from the table (=clustered index) using the clustered index key. That's called key lookup and can be very costly, because it has to be done for every single row found from the index that matches your search criteria.

For the index being covered, you can add the "irrelevant" fields into include part of the index, if you want to try if it improves the situation.



Related Topics



Leave a reply



Submit