Where Does the Practice "Exists (Select 1 from ...)" Come From

Where does the practice exists (select 1 from ...) come from?

The main part of your question is - "where did this myth come from?"

So to answer that, I guess one of the first performance hints people learn with sql is that select * is inefficient in most situations. The fact that it isn't inefficient in this specific situation is hence somewhat counter intuitive. So its not surprising that people are skeptical about it. But some simple research or experiments should be enough to banish most myths. Although human history kinda shows that myths are quite hard to banish.

What does it mean by select 1 from table?

SELECT 1 FROM TABLE_NAME means, "Return 1 from the table". It is pretty unremarkable on its own, so normally it will be used with WHERE and often EXISTS (as @gbn notes, this is not necessarily best practice, it is, however, common enough to be noted, even if it isn't really meaningful (that said, I will use it because others use it and it is "more obvious" immediately. Of course, that might be a viscous chicken vs. egg issue, but I don't generally dwell)).

 SELECT * FROM TABLE1 T1 WHERE EXISTS (
SELECT 1 FROM TABLE2 T2 WHERE T1.ID= T2.ID
);

Basically, the above will return everything from table 1 which has a corresponding ID from table 2. (This is a contrived example, obviously, but I believe it conveys the idea. Personally, I would probably do the above as SELECT * FROM TABLE1 T1 WHERE ID IN (SELECT ID FROM TABLE2); as I view that as FAR more explicit to the reader unless there were a circumstantially compelling reason not to).

EDIT

There actually is one case which I forgot about until just now. In the case where you are trying to determine existence of a value in the database from an outside language, sometimes SELECT 1 FROM TABLE_NAME will be used. This does not offer significant benefit over selecting an individual column, but, depending on implementation, it may offer substantial gains over doing a SELECT *, simply because it is often the case that the more columns that the DB returns to a language, the larger the data structure, which in turn mean that more time will be taken.

IF EXISTS (SELECT 1...) vs IF EXITS (SELECT TOP 1 1...)

If you view the execution plan for these queries you can see that they are identical. Good coding practices would ask that you leave out the "TOP 1" but they should run identical either way.

Sample Image

Select ' ' from TableA

When checking whether something exists in a table, it is common to select an arbitrary value rather than an actual column, because it has an affect on the execution plan (if you select a real column, the execution plan takes that column into account and it can take a little longer, even though you don't use the column).

Most commonly, I have seen 1:

IF EXISTS (SELECT 1 FROM MyTable WHERE SomeColumn > 10)

If you just care whether there is any row, you can short circuit the query rather than getting all rows... although I suspect the EXISTS statement would stop as soon as any row was found anyway.

IF EXISTS (SELECT TOP 1 '' FROM TableA)

Subquery using Exists 1 or Exists *

No, SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.

Quoth Microsoft:
http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4

The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.

To check yourself, try running the following:

SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )

If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.

EDIT: Note, the SQL Standard actually talks about this.

ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt

3) Case:

a) If the <select list> "*" is simply contained in a <subquery> that
is immediately contained in an <exists predicate>, then the <select list> is
equivalent to a <value expression>
that is an arbitrary <literal>.

Difference between if exists (select 1 from) and (select 0 from)

Your performance would be identical. When you use exists, SQL Server doesn't evaluate anything in the SELECT portion of the statement. It simply resolves to a boolean. This can be proven by doing 1 divided by 0. Normally this would throw an error, but inside an EXISTS, it runs without error.

With regards to your specific use, the expense in the EXISTS check is almost nothing. If you are running into performance issues, it's not the EXISTS part of your script.

Example:

IF EXISTS(SELECT 1/0 FROM <TABLE> WHERE 1=1) BEGIN SELECT 'In the exists' END

Create database if db not exist

Could you check the following script :

    IF NOT EXISTS(SELECT * FROM sys.databases WHERE name = 'DataBase')
BEGIN
CREATE DATABASE [DataBase]

END
GO
USE [DataBase]
GO
--You need to check if the table exists
IF NOT EXISTS (SELECT * FROM sysobjects WHERE name='TableName' and xtype='U')
BEGIN
CREATE TABLE TableName (
Id INT PRIMARY KEY IDENTITY (1, 1),
Name VARCHAR(100)
)
END

Performance of SQL EXISTS usage variants

The truth about the EXISTS clause is that the SELECT clause is not evaluated in an EXISTS clause - you could try:

SELECT * 
FROM tableA
WHERE EXISTS (SELECT 1/0
FROM tableB
WHERE tableA.x = tableB.y)

...and should expect a divide by zero error, but you won't because it's not evaluated. This is why my habit is to specify NULL in an EXISTS to demonstrate that the SELECT can be ignored:

SELECT * 
FROM tableA
WHERE EXISTS (SELECT NULL
FROM tableB
WHERE tableA.x = tableB.y)

All that matters in an EXISTS clause is the FROM and beyond clauses - WHERE, GROUP BY, HAVING, etc.

This question wasn't marked with a database in mind, and it should be because vendors handle things differently -- so test, and check the explain/execution plans to confirm. It is possible that behavior changes between versions...

How can I select from list of values in SQL Server

Simplest way to get the distinct values of a long list of comma delimited text would be to use a find an replace with UNION to get the distinct values.

SELECT 1
UNION SELECT 1
UNION SELECT 1
UNION SELECT 2
UNION SELECT 5
UNION SELECT 1
UNION SELECT 6

Applied to your long line of comma delimited text

  • Find and replace every comma with UNION SELECT
  • Add a SELECT in front of the statement

You now should have a working query

SQL WHERE ID IN (id1, id2, ..., idn)

Option 1 is the only good solution.

Why?

  • Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...

  • Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"



Related Topics



Leave a reply



Submit