Referencing Current Row in Filter Clause of Window Function

Referencing current row in FILTER clause of window function

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

SELECT t.*, ct.ct_last4days
FROM (
SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
FROM (
SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
FROM tbl t1
) d
LEFT JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
) ct
JOIN tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

SQL Fiddle.

Related:

  • Select finishes where athlete didn't finish first for the past 3 events
  • PostgreSQL: running count of rows for a query 'by minute'
  • PostgreSQL unnest() with element number

Window functions filter through current row

Can I use a frame and a filter?

You can. But either has restrictions:

  • The expression in the FILTER clause only sees the respective row where it fetches values. There is no way to reference the row for which your window function computes values. So I don't see a way to formulate a filter depending on that row unless we make a huge, expensive cross join - the same row is used for many different computations. Or we are back to LATERAL subqueries that can reference the parent row.

  • The frame definition on the other hand does not allow variables at all. It demands a fixed number, as discussed in the related answer you referenced:

    • Referencing current row in FILTER clause of window function

These restrictions make your particular query hard to implement. This should be correct now:

SELECT *
FROM (
SELECT record_id, security_id, date, price
, CASE WHEN do_calc THEN max(earnings) OVER w1 END AS peak_earnings
, CASE WHEN do_calc THEN min(earnings) OVER w1 END AS minimum_earnings
, CASE WHEN do_calc THEN price / NULLIF(max(earnings) OVER w1, 0) END AS price_to_peak_earnings
, CASE WHEN do_calc THEN price / NULLIF(min(earnings) OVER w1, 0) END AS price_to_minimum_earnings
FROM (
SELECT *, (date - 365) >= min_date AND s.record_id IS NOT NULL AS do_calc
FROM (
SELECT security_id, min_date
, generate_series(min_date, max_date, interval '1 day')::date AS date
FROM (
SELECT security_id, min(date) AS min_date, max(date) AS max_date
FROM security_data
GROUP BY 1
) minmax
) d
LEFT JOIN security_data s USING (security_id, date)
) sub1
WINDOW w1 AS (PARTITION BY security_id ORDER BY date ROWS BETWEEN 365 PRECEDING AND 1 PRECEDING)
) sub2
WHERE record_id IS NOT NULL
ORDER BY 1, 2;

SQL Fiddle.

Notes

  • Nothing in the question says that every security_id would have rows for the same days. Calculating min / max date per security_id in subquery minmax give us the minimum time frame.

  • The time frame for calculations is exactly 365 day preceding the current date of the row and not including the current row (ROWS BETWEEN 365 PRECEDING AND 1 PRECEDING). It's typically more useful to exclude the current row from aggregations to be compared with the current row.

    I adapted the condition for calculations to the same time frame to avoid corner case oddities: (date - 365) >= min_date

  • In the fiddle, where you added 1 row for every 1st of Jan, you can see the effect of leapyears contrasting with a fixed number of 365 day. The window frame is empty after leapyears (2001, 2005, ...).

  • I am using all subqueries, which is typically a bit faster than CTEs.

  • To be sure, we need to include ORDER BY in the frame definition. I updated my old answer you linked to accordingly:

    • Referencing current row in FILTER clause of window function
  • I use w1 as window name, for the "1 year" period. You might add w2, etc. and can have any number of days for each. You could adapt to leapyears after all if you should need to. Might even generate the whole query depending on the current date ...

Filter clause in aggregate window not discarding rows as expected

The over clause has precedence over the filter clause. So you take last_2 (i.e. the current row and the previous to it) and from these you filter, which gets you only one row (the even one).

What you are looking for instead is this:

sum(case when num % 2 = 0 then num else 0 end) over last_2

Window running function except current row

Yes, you can. This does the trick:

with
t(i,x,y) as (
values
(1,1,1),(2,1,3),(3,1,2),
(4,2,4),(5,2,2),(6,2,8)
)
select
t.*,
sum(y) over w as sum,
max(y) over w as max,
count(*) filter (where y > 2) over w as cnt
from t
window w as (partition by x order by i
rows between unbounded preceding and 1 preceding);

The frame_clause selects just those rows from the window frame that you are interested in.

Note that in the sum column you'll get null rather than 0 because of the frame clause: the first row in the frame has no row before it. You can coalesce() this away if needed.

SQLFiddle



Related Topics



Leave a reply



Submit