How Do SQL Exists Statements Work

How does the EXISTS Clause work in SQL Server?

From documentation EXISTS:

Specifies a subquery to test for the existence of rows.

SELECT 1 
WHERE EXISTS ( SELECT 0 WHERE 1 = 1 )
-- there is row

SELECT 1
WHERE EXISTS ( SELECT 0 WHERE 1 = 0 )
-- no row returned by subquery

SELECT 1 WHERE EXISTS ( 1 )
-- not even valid query `1` is not subquery

Keep in mind that it checks rows not values so:

SELECT 1 
WHERE EXISTS ( SELECT NULL WHERE 1 = 1 )
-- will return 1

LiveDemo

EDIT:

This seems contradictory with the sentence " EXISTS clause evaluates for TRUE/FALSE" ?

EXISTS operator tests for the existence of rows and it returns TRUE/FALSE.

So if subquery returns:

╔══════════╗     ╔══════════╗     ╔══════════╗     ╔══════════╗
║ subquery ║ ║ subquery ║ ║ subquery ║ ║ subquery ║
╠══════════╣ ╠══════════╣ ╠══════════╣ ╠══════════╣
║ NULL ║ ║ 1 ║ ║ 0 ║ ║anything ║
╚══════════╝ ╚══════════╝ ╚══════════╝ ╚══════════╝

Then EXISTS (subquery) -> TRUE.

If subquery returns (no rows):

╔══════════╗
║ subquery ║
╚══════════╝

Then EXISTS (subquery) -> FALSE.

How does an SQL Exists statement even work if the syntax is wrong?

SQL Server doesn't require FROM in the SELECT statement.

For example, there is no syntax error in the following query:

SELECT 2 AS t
WHERE 0 = 0

It returns one row with column t and value 2.

You can write simple

SELECT 2 AS t

to get the same result.


Your query is the same as this:

SELECT *
FROM Customer_Tbl
WHERE EXISTS(
SELECT 2 AS Customer_Tbl
WHERE Customer_Tbl.Country = 'MEXICO'
);

Inner Customer_Tbl is an alias for the column with the constant 2. The Country in WHERE is a column of an outer table Customer_Tbl.

It is a good practice to use AS for aliases and fully qualify the columns with their table names.


When you try to run the inner part separately

SELECT 2 Customer_Tbl 
WHERE Country = 'MEXICO'

it fails not because there is no FROM, but because the parser doesn't know what is Country:

Msg 207, Level 16, State 1, Line 2 Invalid column name 'Country'.


For the sake of completeness, here is the syntax of SELECT statement in SQL Server from MSDN:

<SELECT statement> ::=    
[ WITH { [ XMLNAMESPACES ,] [ <common_table_expression> [,...n] ] } ]
<query_expression>
[ ORDER BY { order_by_expression | column_position [ ASC | DESC ] }
[ ,...n ] ]
[ <FOR Clause>]
[ OPTION ( <query_hint> [ ,...n ] ) ]
<query_expression> ::=
{ <query_specification> | ( <query_expression> ) }
[ { UNION [ ALL ] | EXCEPT | INTERSECT }
<query_specification> | ( <query_expression> ) [...n ] ]
<query_specification> ::=
SELECT [ ALL | DISTINCT ]
[TOP ( expression ) [PERCENT] [ WITH TIES ] ]
< select_list >
[ INTO new_table ]
[ FROM { <table_source> } [ ,...n ] ]
[ WHERE <search_condition> ]
[ <GROUP BY> ]
[ HAVING < search_condition > ]

Optional clauses are in square brackets [ ]. As you can see, pretty much all clauses are optional, including FROM, except the SELECT keyword itself and < select_list >.

EXISTS vs JOIN and use of EXISTS clause

EXISTS is used to return a boolean value, JOIN returns a whole other table

EXISTS is only used to test if a subquery returns results, and short circuits as soon as it does. JOIN is used to extend a result set by combining it with additional fields from another table to which there is a relation.

In your example, the queries are semantically equivalent.

In general, use EXISTS when:

  • You don't need to return data from the related table
  • You have dupes in the related table (JOIN can cause duplicate rows if values are repeated)
  • You want to check existence (use instead of LEFT OUTER JOIN...NULL condition)

If you have proper indexes, most of the time the EXISTS will perform identically to the JOIN. The exception is on very complicated subqueries, where it is normally quicker to use EXISTS.

If your JOIN key is not indexed, it may be quicker to use EXISTS but you will need to test for your specific circumstance.

JOIN syntax is easier to read and clearer normally as well.

Understanding NOT EXISTS in SQL

So, the original problem was to:

Retrieve the names of each employee who works on ALL the projects controlled by department 5.

The provided answer makes use of the equivalence of:

  1. All x such that f(x)
  2. No x such that not f(x)

To put that in English, the problem is equivalent to finding those employees for whom there is no project controlled by department 5 that the employee doesn't work on.

So, first find all the projects controlled by department 5, then remove from that any project that the employee works on. That's exactly what the provided answer is doing. If there is nothing left, then there is no project controlled by department 5 that the employee doesn't work on. So by the equivalance, the employee works on all the projects controlled by that department.

While this is technically correct, it can feel a little odd. Especially if it were the case that department 5 controls zero projects. If that were true, the query would return all the employees ... which might not be quite what was expected.

How do I use T-SQL's Exists keyword?

To answer your question about using the EXISTS keyword, here is an example query that uses an EXISTS predicate, based on the query as currently given in your question.


SELECT t.*
FROM tblTransaction t
WHERE EXISTS
(
SELECT 1
FROM tblTenantTransCode ttc
JOIN tblCheckbookCode cc
ON (cc.ID = ttc.CheckbookCode AND cc.Description='Rent Income')
WHERE ttc.ID = t.TransactionCode
)

Additional Details:

We all recognize that there are a variety of SQL statements that will return the result set that meets the specified requirements. And there are likely going to be differences in the observed performance of those queries. Performance is particularly dependent on the DBMS, the optimizer mode, the query plan, and the statistics (number of rows and data value distribution).

One advantage of the EXISTS is that it makes clear that we aren't interested returning any expressions from tables in the subquery. It serves to logically separate the subquery from the outer query, in a way that a JOIN does not.

Another advantage of using EXISTS is that avoids returning duplicate rows that would be (might be) returned if we were to instead use a JOIN.

An EXISTS predicate can be used to test for the existence of any related row in a child table, without requiring a join. As an example, the following query returns a set of all orders that have at least one associated line_item:


SELECT o.*
FROM order o
WHERE EXISTS
( SELECT 1
FROM line_item li
WHERE li.order_id = o.id
)

Note that the subquery doesn't need to find ALL matching line items, it only needs to find one row in order to satisfy the condition. (If we were to write this query as a JOIN, then we would return duplicate rows whenever an order had more than one line item.)

A NOT EXISTS predicate is also useful, for example, to return a set of orders that do not have any associated line_items.


SELECT o.*
FROM order o
WHERE NOT EXISTS
( SELECT 1
FROM line_item li
WHERE li.order_id = o.id
)

Of course, NOT EXISTS is just one alternative. An equivalent result set could be obtained using an OUTER join and an IS NULL test (assuming we have at least one expression available from the line_item table that is NOT NULL)


SELECT o.*
FROM order o
LEFT
JOIN line_item li ON (li.order_id = o.id)
WHERE li.id IS NULL

There seems to be a lot of discussion (relating to answers to the original question) about needing to use an IN predicate, or needing to use a JOIN.

Those constructs are alternatives, but aren't necessary. The required result set can be returned by a query without using an IN and without using a JOIN. The result set can be returned with a query that uses an EXISTS predicate. (Note that the title of the OP question did ask about how to use the EXISTS keyword.)

Here is another alternative query (this is not my first choice), but the result set returned does satisfy the specified requirements:



SELECT t.*
FROM tblTransaction t
WHERE EXISTS
(
SELECT 1
FROM tblTenantTransCode ttc
WHERE ttc.ID = t.TransactionCode
AND EXISTS
(
SELECT 1
FROM tblCheckbookCode cc
WHERE cc.ID = ttc.CheckbookCode
AND cc.Description = 'Rent Income'
)
)

Of primary importance, the query should return a correct result set, one that satisfies the specified requirements, given all possible sets of conditions.

Some of the queries presented as answers here do NOT return the requested result set, or if they do, they happen to do so by accident. Some of the queries will work if we pre-assume something about the data, such that some columns are UNIQUE and NOT NULL.

Performance differences

Sometimes a query with an EXISTS predicate will not perform as well as a query with a JOIN or an IN predicate. In some cases, it may perform better. (With the EXISTS predicate, the subquery only has to find one row that satisfies the condition, rather than finding ALL matching rows, as would be required by a JOIN.)

Performance of various query options is best gauged by observation.



Related Topics



Leave a reply



Submit