Real Life Example, When to Use Outer/Cross Apply in Sql

Real life example, when to use OUTER / CROSS APPLY in SQL

Some uses for APPLY are...

1) Top N per group queries (can be more efficient for some cardinalities)

SELECT pr.name,
pa.name
FROM sys.procedures pr
OUTER APPLY (SELECT TOP 2 *
FROM sys.parameters pa
WHERE pa.object_id = pr.object_id
ORDER BY pr.name) pa
ORDER BY pr.name,
pa.name

2) Calling a Table Valued Function for each row in the outer query

SELECT *
FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle)

3) Reusing a column alias

SELECT number,
doubled_number,
doubled_number_plus_one
FROM master..spt_values
CROSS APPLY (SELECT 2 * CAST(number AS BIGINT)) CA1(doubled_number)
CROSS APPLY (SELECT doubled_number + 1) CA2(doubled_number_plus_one)

4) Unpivoting more than one group of columns

Assumes 1NF violating table structure....

CREATE TABLE T
(
Id INT PRIMARY KEY,

Foo1 INT, Foo2 INT, Foo3 INT,
Bar1 INT, Bar2 INT, Bar3 INT
);

Example using 2008+ VALUES syntax.

SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (VALUES(Foo1, Bar1),
(Foo2, Bar2),
(Foo3, Bar3)) V(Foo, Bar);

In 2005 UNION ALL can be used instead.

SELECT Id,
Foo,
Bar
FROM T
CROSS APPLY (SELECT Foo1, Bar1
UNION ALL
SELECT Foo2, Bar2
UNION ALL
SELECT Foo3, Bar3) V(Foo, Bar);

When should I use CROSS APPLY over INNER JOIN?

Can anyone give me a good example of when CROSS APPLY makes a difference in those cases where INNER JOIN will work as well?

See the article in my blog for detailed performance comparison:

  • INNER JOIN vs. CROSS APPLY

CROSS APPLY works better on things that have no simple JOIN condition.

This one selects 3 last records from t2 for each record from t1:

SELECT  t1.*, t2o.*
FROM t1
CROSS APPLY
(
SELECT TOP 3 *
FROM t2
WHERE t2.t1_id = t1.id
ORDER BY
t2.rank DESC
) t2o

It cannot be easily formulated with an INNER JOIN condition.

You could probably do something like that using CTE's and window function:

WITH    t2o AS
(
SELECT t2.*, ROW_NUMBER() OVER (PARTITION BY t1_id ORDER BY rank) AS rn
FROM t2
)
SELECT t1.*, t2o.*
FROM t1
INNER JOIN
t2o
ON t2o.t1_id = t1.id
AND t2o.rn <= 3

, but this is less readable and probably less efficient.

Update:

Just checked.

master is a table of about 20,000,000 records with a PRIMARY KEY on id.

This query:

WITH    q AS
(
SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM master
),
t AS
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
JOIN q
ON q.rn <= t.id

runs for almost 30 seconds, while this one:

WITH    t AS 
(
SELECT 1 AS id
UNION ALL
SELECT 2
)
SELECT *
FROM t
CROSS APPLY
(
SELECT TOP (t.id) m.*
FROM master m
ORDER BY
id
) q

is instant.

Where to use Outer Apply

A LEFT JOIN should be replaced with OUTER APPLY in the following situations.

1. If we want to join two tables based on TOP n results

Consider if we need to select Id and Name from Master and last two dates for each Id from Details table.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
LEFT JOIN
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
ORDER BY CAST(PERIOD AS DATE)DESC
)D
ON M.ID=D.ID

which forms the following result

x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | NULL | NULL |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x

This will bring wrong results ie, it will bring only latest two dates data from Details table irrespective of Id even though we join with Id. So the proper solution is using OUTER APPLY.

SELECT M.ID,M.NAME,D.PERIOD,D.QTY
FROM MASTER M
OUTER APPLY
(
SELECT TOP 2 ID, PERIOD,QTY
FROM DETAILS D
WHERE M.ID=D.ID
ORDER BY CAST(PERIOD AS DATE)DESC
)D

Here is the working : In LEFT JOIN , TOP 2 dates will be joined to the MASTER only after executing the query inside derived table D. In OUTER APPLY, it uses joining WHERE M.ID=D.ID inside the OUTER APPLY, so that each ID in Master will be joined with TOP 2 dates which will bring the following result.

x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-08 | 40 |
| 2 | B | 2014-01-06 | 30 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x

2. When we need LEFT JOIN functionality using functions.

OUTER APPLY can be used as a replacement with LEFT JOIN when we need to get result from Master table and a function.

SELECT M.ID,M.NAME,C.PERIOD,C.QTY
FROM MASTER M
OUTER APPLY dbo.FnGetQty(M.ID) C

And the function goes here.

CREATE FUNCTION FnGetQty 
(
@Id INT
)
RETURNS TABLE
AS
RETURN
(
SELECT ID,PERIOD,QTY
FROM DETAILS
WHERE ID=@Id
)

which generated the following result

x------x---------x--------------x-------x
| Id | Name | PERIOD | QTY |
x------x---------x--------------x-------x
| 1 | A | 2014-01-13 | 10 |
| 1 | A | 2014-01-11 | 15 |
| 1 | A | 2014-01-12 | 20 |
| 2 | B | 2014-01-06 | 30 |
| 2 | B | 2014-01-08 | 40 |
| 3 | C | NULL | NULL |
x------x---------x--------------x-------x

3. Retain NULL values when unpivoting

Consider you have the below table

x------x-------------x--------------x
| Id | FROMDATE | TODATE |
x------x-------------x--------------x
| 1 | 2014-01-11 | 2014-01-13 |
| 1 | 2014-02-23 | 2014-02-27 |
| 2 | 2014-05-06 | 2014-05-30 |
| 3 | NULL | NULL |
x------x-------------x--------------x

When you use UNPIVOT to bring FROMDATE AND TODATE to one column, it will eliminate NULL values by default.

SELECT ID,DATES
FROM MYTABLE
UNPIVOT (DATES FOR COLS IN (FROMDATE,TODATE)) P

which generates the below result. Note that we have missed the record of Id number 3

  x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
x------x-------------x

In such cases an APPLY can be used(either CROSS APPLY or OUTER APPLY, which is interchangeable).

SELECT DISTINCT ID,DATES
FROM MYTABLE
OUTER APPLY(VALUES (FROMDATE),(TODATE))
COLUMNNAMES(DATES)

which forms the following result and retains Id where its value is 3

  x------x-------------x
| Id | DATES |
x------x-------------x
| 1 | 2014-01-11 |
| 1 | 2014-01-13 |
| 1 | 2014-02-23 |
| 1 | 2014-02-27 |
| 2 | 2014-05-06 |
| 2 | 2014-05-30 |
| 3 | NULL |
x------x-------------x

TOP and OUTER APPLY

I found out that the OUTER APPLY will be done on all the rows first, and then the TOP 5 will be returned.

Is there any difference between CROSS APPLY and OUTER APPLY when creating a cartesian product?

The difference comes into play when applied table or table-valued function has no records:

SELECT      First.Id AS FirstId, Second.Id AS SecondId
FROM First
OUTER APPLY (SELECT * FROM Second WHERE Second.Id = -1) Second
WHERE First.Name IN ('First253', 'First3304');

2 rows returned


SELECT First.Id AS FirstId, Second.Id AS SecondId
FROM First
CROSS APPLY (SELECT * FROM Second WHERE Second.Id = -1) Second
WHERE First.Name IN ('First253', 'First3304');

0 rows returned

In OP's own words:

Not the way you're doing it, because conceptually you're filtering with WHERE after the APPLY (although the plans show the engine optimizing by doing it first); but if you explicitly filter first and then APPLY like this:

SELECT      First.Id AS FirstId, FilteredSecond.Id AS SecondId
FROM First
CROSS APPLY (SELECT Id FROM Second WHERE Name IN ('xxx')) FilteredSecond
WHERE First.Name IN ('First253', 'First3304');

you'd see the difference because you'd get rows with NULLs with the OUTER but no rows with the CROSS.



Related Topics



Leave a reply



Submit