SQL Server 'In' or 'Or' - Which Is Fastest

sql server 'in' or 'or' - which is fastest

Both IN and OR will do a query for b = 0 followed by one for b = 3, and then do a merge join on the two result sets, and finally filter out any duplicates.

With IN, duplicates doesn't really make sense, because b can't both be 0 and 3, but the fact is that IN will be converted to b = 0 OR b = 3, and with OR, duplicates do make sense, because you could have b = 0 OR a = 3, and if you were to join the two separate result sets, you could end up with duplicates for each record that matched both criteria.

So a duplicate filtering will always be done, regardless of whether you're using IN or OR. However, if you know from the outset that you will not have any duplicates - which is usually the case when you're using IN - then you can gain some performance by using UNION ALL which doesn't filter out duplicates:

select distinct(a)
from mytable
where
b = 0

UNION ALL

select distinct(a)
from mytable
where
b = 3

What's faster IN or OR?

"IN" will be translated to a series of "OR"s...if you look at the execution plan for a query with "IN", you'll see it has expanded it out.

Much cleaner to use "IN" in my opinion, especially in larger queries it makes it much more readable.

Why ' in ' is so much faster than ' = ' in SQL Select?

The "IN" means "there might be more that one row returned in this subquery, please check them all" whereas "=" means "there will be only one line returned from subquery" otherwise it would be an error.

Having that info the optimizer build different query plans. For "="-query it executes subquery first and then filters the custInfo table out.

For the "IN" query optimizer performs a join operation as if you've written following query

SELECT *
FROM custinfo cs
JOIN customers c
ON cs.idcust = c.cust_id
WHERE c.id = 1230;

This is why execution time differs. It can take longer or not depending on you data selectivity, indexes, partitioning and so on

UPD. From the execution plans you've uploaded I see the following

  1. For the "=" query:
1.1. It competely scans the MT_OPERATION_OUT table (FULL TABLE SCAN), captures the result
1.2. Then it accesess another table on remote DB, presumably scans it too (REMOTE)
1.3. Filters data it got from remote.

  1. For the "IN" query:
2.1. It competely scans the MT_OPERATION_OUT table (FULL TABLE SCAN), captures the result
2.2. Sorts what it got on the previous step (SORT UNIQUE)
2.3. Then it accesess another table on remote DB, presumably scans it too (REMOTE)
2.4. Performs a join (NESTED LOOPS)

So to me it seems that for some reason the db needs more time to filter data from remote db that to join it using "nested loops" method.

SQL: BETWEEN and IN (which is faster)

  • If your ids are always consecutive you should use BETWEEN.
  • If your ids may or may not be consecutive then use IN.

Performance shouldn't really be the deciding factor here. Having said that, BETWEEN seems to be faster in all examples that I have tested. For example:

Without indexes, checking a table with a million rows where every row has x = 1:


SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.55s

SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.54s

Without indexes, checking a table with a million rows where x has unique values:


SELECT COUNT(*) FROM table1 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.65s

SELECT COUNT(*) FROM table1 WHERE x BETWEEN 1 AND 6;
Time taken: 0.36s

A more realistic example though is that the id column is unique and indexed. When you do this the performance of both queries becomes close to instant.


SELECT COUNT(*) FROM table2 WHERE x IN (1, 2, 3, 4, 5, 6);
Time taken: 0.00s

SELECT COUNT(*) FROM table2 WHERE x BETWEEN 1 AND 6;
Time taken: 0.00s

So I'd say concentrate on writing a clear SQL statement rather than worrying about minor differences in execution speed. And make sure that the table is correctly indexed because that will make the biggest difference.

Note: Tests were performed on SQL Server Express 2008 R2. Results may be different on other systems.

Fastest SQL Server protocol?

VIA. This is the fastest SQL Protocol, it runs on dedicated hardware and is used in doing SQL Server benchmarked records.

Note that the VIA protocol is deprecated
by Microsoft, and will be removed in a
future version of Microsoft SQL Server.
It is however supported in SQL Server 2008,
SQL Server 2008 R2 and SQL Server 2012.

Shared Memory is next as performance, but it only works between a client and a server that can actually share memory, so local only.

For remote connectivity on ordinary hardware, TCP is the way to go. Under normal operations, it has the same performance as Named Pipes. On slow or busy networks, it outperforms NP in robustness and speed, a fact documented in MSDN:

For named pipes, network
communications are typically more
interactive. A peer does not send data
until another peer asks for it using a
read command. A network read typically
involves a series of peek named pipes
messages before it starts to read the
data. These can be very costly in a
slow network and cause excessive
network traffic, which in turn affects
other network clients.

Named Pipes also can lead to client connect time outs:

TCP/IP Sockets also support a backlog
queue. This can provide a limited
smoothing effect compared to named
pipes that could lead to pipe-busy
errors when you are trying to connect
to SQL Server.

Unfortunately the normal client configuration tries NP first and this can cause connectivity problems (for the reasons cited above), where enforcing TCP on client network config (or in connection string, via tcp:servername) skips the NP connect attempt and goes straight to TCP for a much better experience under load.

Now is true that the same link I quoted above goes on to praise NP for its easy of configuration, most likely referring to no need to open SQL TCP port in firewall, but is there where me and BOL have different views.

Are Views or Functions faster in SQL?

Well, obviously views and SQL functions are different things.

Try to use a function where it needs to be clear to a user in the future (maybe yourself!) that the data returned requires certain parameters where the data does not make sense without those parameters. Sort of like forcing the user to include a WHERE clause.

In your example, you may want to force the user to filter by CustomerId or ReceiptId.

HOWEVER....

In this case, the view approach would probably be better.

  1. Functions, by design, do not use temporary tables, but use table variables instead. Tables as variables are much slower than temp tables.
  2. The query you've included is really straight forward with no surprises. The view would be the simplest and best approach here.

For 125M rows, I suggest either checking execution plan during processing (include a WHERE clause for this) or dumping data into a summary table that is updated periodically. Or both. Check indexes all along the way.

Here is more (better) discussion Test SQL Queries

How to make LIKE '%Search% faster in SQL Server

You are right... queries with a leading wildcard are awful for performance. To get around this, Sql Server has something called full text search. You create a special FULL TEXT Index for each of the columns you want to search, and then update your code to use the CONTAINS keyword:

SELECT 
p.CrmId,
park.Name
from Property p
inner join Som som on som.CrmId = p.SystemOfMeasurementId
left join Park park on park.CrmId = p.ParkId
WHERE
(
Contains(p.City, @search)
or Contains(p.Address1, @search)
or Contains(p.Address2, @search)
or Contains(p.State, @search)
or Contains(park.Name, @search)
or Contains(p.ZipCode, @search)
)
AND (@usOnly = 0 or (p.CrmCountryId = @USA_COUNTRY_ID))

Unfortunately, all those OR conditions are still likely to make this pretty slow, and FULL TEXT wasn't intended as much for shorter strings like City or State, or for casting wide nets like this. You may find you'll do much better for this kind of search by integrating with a tool like Solr or ElasticSearch. In addition to writing a better and faster search, these tools will help you create sane rankings for returning results in an order that makes sense and is relevant to the input.

Another strategy it to create a computed column that concatenates your address and name text into a single column, and then do a single FULL TEXT index on that one field, with a single CONTAINS() call.

How can I perform a SQL 'NOT IN' query faster?

You can use a left outer join, or a not exists clause.

Left outer join:

select E.EmailAddress
from EMAIL E left outer join BLACKLIST B on (E.EmailAddress = B.EmailAddress)
where B.EmailAddress is null;

Not Exists:

select E.EmailAddress
from EMAIL E where not exists
(select EmailAddress from BLACKLIST B where B.EmailAddress = E.EmailAddress)

Both are quite generic SQL solutions (don't depend on a specific DB engine). I would say that the latter is a little bit more performant (not by much though). But definitely more performant than the not in one.

As commenters stated, you can also try creating an index on BLACKLIST(EmailAddress), that should help speed up the execution of your query.

Which SQL query is faster and why?

A tiny database makes it difficult to determine which is better, but SQL Server Management Studio has functionality to compare the efficiency of statements to one another.

  1. Open Management Studio
  2. Click the "New Query" button
  3. Click to enable the "Include Actual Query Plan"
  4. Post all the queries into the active query window
  5. Click the "Execute" button
  6. Click the "Execution plan" tab (to the left of the results) when it appears

The query cost is averaged by the number of queries run. So if comparing the two queries provided as examples, if both have a cost of 50% then they are equivalent (because 100 / 2 = 50, etc). When there's a difference, you can mouseover the SELECT to review the subtree cost, besides looking at the graphical layout of the Execution path.

Why is one faster than the other?

That depends on the database -- the data types being joined (are they as narrow as they could be? "narrow" means taking less bytes to store), indexes, and what is being performed in the query. Using different syntax can make all the difference.



Related Topics



Leave a reply



Submit