Why No Windowed Functions in Where Clauses

Why no windowed functions in where clauses?

why can't I use a windowed function in a where clause in SQL Server?

One answer, though not particularly informative, is because the spec says that you can't.

See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).

really I'm looking for the reasoning behind not being able to use
windowing functions in where clauses.

The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)

Suppose your table was

CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)

INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')

And your query

SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'

What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?

Window function is not supported in partition by clauses

You will need to materialize the values of your window functions before you perform any sort of filtering, partitioning, or conditional operations on that value.

There are a few ways to go about doing this, and the appropriate one for your use case will vary depending factor outside of this scope.

You may accomplish this using a view, CTE, temp table, or a table variable prior to attempting this partitioning operation. This is not an exhaustive list.

Windowed functions can only appear in the SELECT or ORDER BY clauses

Windowed functions are defined in the ANSI spec to logically execute after the processing of GROUP BY, HAVING, WHERE.

To be more specific they are allowed at steps 5.1 and 6 in the Logical Query Processing flow chart here .

I suppose they could have defined it another way and allowed GROUP BY, WHERE, HAVING to use window functions with the window being the logical result set at the start of that phase but suppose they had and we were allowed to construct queries such as

SELECT a, 
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForSelect
FROM YourTable
WHERE NTILE(2) OVER (PARTITION BY a ORDER BY b) > 1
GROUP BY a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b)
HAVING NTILE(2) OVER (PARTITION BY a ORDER BY b) = 1

With four different logical windows in play good luck working out what the result of this would be! Also what if in the HAVING you actually wanted to filter by the expression from the GROUP BY level above rather than with the window of rows being the result after the GROUP BY?

The CTE version is more verbose but also more explicit and easier to follow.

WITH T1 AS
(
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForWhere
FROM YourTable
), T2 AS
(
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForGroupBy
FROM T1
WHERE NtileForWhere > 1
), T3 AS
(
SELECT a,
b,
NtileForGroupBy,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForHaving
FROM T2
GROUP BY a,b, NtileForGroupBy
)
SELECT a,
b,
NTILE(2) OVER (PARTITION BY a ORDER BY b) AS NtileForSelect
FROM T3
WHERE NtileForHaving = 1

As these are all defined in the SELECT statement and are aliased it is easily achievable to disambiguate results from different levels e.g. simply by switching WHERE NtileForHaving = 1 to NtileForGroupBy = 1

Filtering on window functions in SQL Server

Looking at the sample data it looks like you need to partition by JOB_ID only:

WITH mycte AS (
...
), cte2 as (
SELECT
ID, JOB_ID, STATUS, BASE_ID, PERCENTAGE,
SUM(percentage) OVER (PARTITION BY JOB_ID) AS PERCENTAGE_SUM
FROM mycte
)
SELECT *
FROM cte2
WHERE PERCENTAGE_SUM = 100

Window function is not allowed in where clause redshift

Just move your WHERE clause to the outer SELECT. Seqnum doesn't exists until the CTE runs but does exist when the result of the CTE is consumed.

UPDATE ...

After moving the where clause AndyP got a correlated subquery error coming from a WHERE clause not included in the posted query. As shown in this somewhat modified query:

WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
WHERE features.version = (SELECT MAX(version)
FROM headers f2
WHERE features.id = f2.id
AND features.type = f2.type
AND f2.last_update <= dates.week_of)
AND features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;

This was an interesting replacement of a correlated query with a join due to the inequality in the correlated sub query. We thought others might be helped by posting the final solution. This works:

WITH dates AS
(
SELECT (DATE_TRUNC('week',getdate () +INTERVAL '1 day')::DATE- 7*(ROW_NUMBER() OVER (ORDER BY TRUE) - 1) -INTERVAL '1 day')::DATE AS week_of
FROM (SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X UNION ALL SELECT 1 AS X)
)
SELECT dates.week_of,
'W' || CEILING(DATE_PART('week',dates.week_of +INTERVAL '1 day')) AS week_number,
COUNT(DISTINCT features.carrier_id) AS total
FROM dimensions.program features
JOIN dates ON features.last_update <= dates.week_of
JOIN (SELECT MAX(MAX(version)) OVER(Partition by id, type Order by dates.weeks_of rows unbounded preceding) AS feature_version,
f2.id,
f2.type,
dates.week_of
FROM dimensions.headers f2
JOIN dates ON f2.last_update <= dates.week_of
GROUP BY f2.id,
f2.type,
dates.week_of) f2
ON features.id = f2.id
AND features.type = f2.type
AND f2.week_of = dates.week_of
AND features.version = f2.version
WHERE features.type = 'type'
AND features.status = 'live'
GROUP BY dates.week_of
ORDER BY dates.week_of DESC;

Needing to make a data segment that had all the possible Max(version) for all possible week_of values was the key. Hopefully having both of these queries posted will help other fix correlated subquery errors.

WINDOW function alternative in HAVING Clause

One method uses rank() in a subquery:

select *
from (
select
prod_id,
prod_name,
prod_date,
rank() over(partition by prod_id order by prod_date) rn
from dim_product
where prod_name in ('xxx', 'yyy', 'zzz')
) t
where rn = 1

As an alternative, you may also want to consider filtering with a correlated subquery:

select prod_id, prod_id, prod_date
from dim_product d
where d.prod_name in ('xxx', 'yyy', 'zzz') and d.prod_date = (
select min(d1.prod_date)
from dim_product d1
where d1.prod_id = d.prod_id and d1.prod_name in ('xxx', 'yyy', 'zzz')
)

For efficiency in the latter query, you want an index on (prod_name, prod_id, prod_date).



Related Topics



Leave a reply



Submit