PostgreSQL window function: partition by comparison
Using several different window functions and two subqueries, this should work decently fast:
WITH events(id, event, ts) AS (
VALUES
(1, 12, '2014-03-19 08:00:00'::timestamp)
,(2, 12, '2014-03-19 08:30:00')
,(3, 13, '2014-03-19 09:00:00')
,(4, 13, '2014-03-19 09:30:00')
,(5, 12, '2014-03-19 10:00:00')
)
SELECT first_value(pre_id) OVER (PARTITION BY grp ORDER BY ts) AS pre_id
, id, ts
, first_value(post_id) OVER (PARTITION BY grp ORDER BY ts DESC) AS post_id
FROM (
SELECT *, count(step) OVER w AS grp
FROM (
SELECT id, ts
, NULLIF(lag(event) OVER w, event) AS step
, lag(id) OVER w AS pre_id
, lead(id) OVER w AS post_id
FROM events
WINDOW w AS (ORDER BY ts)
) sub1
WINDOW w AS (ORDER BY ts)
) sub2
ORDER BY ts;
Using ts
as name for the timestamp column.
Assuming ts
to be unique - and indexed (a unique constraint does that automatically).
In a test with a real life table with 50k rows it only needed a single index scan. So, should be decently fast even with big tables. In comparison, your query with join / distinct did not finish after a minute (as expected).
Even an optimized version, dealing with one cross join at a time (the left join with hardly a limiting condition is effectively a limited cross join) did not finish after a minute.
For best performance with a big table, tune your memory settings, in particular for work_mem
(for big sort operations). Consider setting it (much) higher for your session temporarily if you can spare the RAM. Read more here and here.
How?
In subquery
sub1
look at the event from the previous row and only keep that if it has changed, thus marking the first element of a new group. At the same time, get theid
of the previous and the next row (pre_id
,post_id
).In subquery
sub2
,count()
only counts non-null values. The resultinggrp
marks peers in blocks of consecutive same events.In the final
SELECT
, take the firstpre_id
and the lastpost_id
per group for each row to arrive at the desired result.
Actually, this should be even faster in the outerSELECT
:last_value(post_id) OVER (PARTITION BY grp ORDER BY ts
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS post_id... since the sort order of the window agrees with the window for
pre_id
, so only a single sort is needed. A quick test seems to confirm it. More about this frame definition.
SQL Fiddle.
PostgreSQL Window Function ordering
partition over x the ordering is not working the way I expect
It is working perfectly fine. When you partition by x first 1
and last 1
are in the same group.
Window Functions:
The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into the same partition as the current row.
To get result you want you could use (classic example of gaps and islands problem):
SELECT *, ROW_NUMBER() OVER (ORDER BY y) -
ROW_NUMBER() OVER (PARTITION BY x ORDER BY y) + 1 AS group_id
FROM tab
ORDER BY group_id
LiveDemo
Output:
╔═══╦═══╦══════════╗
║ x ║ y ║ group_id ║
╠═══╬═══╬══════════╣
║ 1 ║ 1 ║ 1 ║
║ 2 ║ 2 ║ 2 ║
║ 2 ║ 3 ║ 2 ║
║ 1 ║ 4 ║ 3 ║
╚═══╩═══╩══════════╝
Warning:
This solution is not general.
EDIT:
More general solution is to utilize LAG
to get previous value and windowed SUM
:
WITH cte AS
(
SELECT t1.x, t1.y, LAG(x) OVER(ORDER BY y) AS x_prev
FROM tab t1
)
SELECT x,y, SUM( CASE WHEN x = COALESCE(x_prev,x) THEN 0 ELSE 1 END)
OVER(ORDER BY y) + 1 AS group_id
FROM cte
ORDER BY group_id;
LiveDemo2
PostgreSQL - how do I know which partition I'm in when using window functions?
We can try using DENSE_RANK
and ROW_NUMBER
:
WITH yourTable AS (
SELECT 'A' AS col1, 1 AS col2 UNION ALL
SELECT 'A', 2 UNION ALL
SELECT 'A', 3 UNION ALL
SELECT 'B', 1 UNION ALL
SELECT 'B', 2 UNION ALL
SELECT 'B', 3
)
SELECT *,
DENSE_RANK() OVER (ORDER BY col1) partition1,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) partition2
FROM yourTable
ORDER BY
col1, col2;
Demo
The general pattern here, for an arbitrary number of partitions, would be to use DENSE_RANK
for the first N-1 partitions, and then ROW_NUMBER
at the last partition. ROW_NUMBER
would ensure that the last partition has 1,2,3,... as the sequence.
Can I use one PARTITION definition for multiple window function calls?
You can try to use WINDOW
Clause, The optional WINDOW clause has the general form
WINDOW window_name AS ( window_definition ) [, ...]
Then use OVER
window_name for your aggregate function, it might be more elegant
select id,weekly,
sum(totalsteps) over w as total_steps,
sum(totaldistance) over w as total_distance,
sum(veryactiveminutes) over w as total_veryactive,
sum(fairlyactiveminutes) over w as total_fairlyactive,
sum(lightlyactiveminutes) over w as total_lightlyactive,
sum(totalsteps) over w as total_steps,
sum(totaldistance) over w as total_distance,
sum(veryactivedistance) over w as total_veryactivedistance,
sum(moderatelyactivedistance) over w as total_moderatelyactivedistance,
sum(lightactivedistance) over w as total_lightactivedistance,
sum(sedentaryactivedistance) over w as total_sedentaryactivedistance,
sum(calories) over w as total_calories,
sum(totalminutesover w as leep) over w as total_over w as leep,
sum(totaltimeinbed) over w as total_inbed
from (
select *, date_trunc('week', activitydate) as weekly
from activitysleep_merged
) WINDOW w AS ( PARTITION BY id, weekly );
more detail we can refer WINDOW Clause
SQLfiddle
Partition By with Order By Clause in PostgreSQL
Update
This behavior is documented here:
4.2.8. Window Function Calls
[..]
The default framing option isRANGE UNBOUNDED PRECEDING
, which is
the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. With
ORDER BY
, this sets the frame to be all rows from the partition
start up through the current row's lastORDER BY
peer. Without
ORDER BY
, this means all rows of the partition are included in the
window frame, since all rows become peers of the current row.
That means:
In absence of a frame_clause – RANGE UNBOUNDED PRECEDING
is used by default. That includes:
- All rows "preceding" the current row according to the
ORDER BY
clause - The current row
- All rows which have the same values in the
ORDER BY
columns as the current row
In absence of an ORDER BY
clause – ORDER BY NULL
is assumed (though I'm guessing again). Thus the frame will include all rows from the partition, because the values in the ORDER BY
column(s) are the same (which is always NULL
) in every row.
Original answer:
Disclaimer: The following is more a guess than a qualified answer. I didn't find any documentation, which can confirm what I write. At the same time I don't think that currently given answers correctly explain the behavior.
The reason for the diffrence in the results is not directly the ORDER BY clause, since a + b + c
is the same as c + b + a
. The reason is (and that is my guess) that the ORDER BY clause implicitly defines the frame_clause as
rows between unbounded preceding and current row
Try the following query:
select *
, sum(val) over (partition by user_id) as res
, sum(val) over (partition by user_id order by ts) as res_order_by
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and current row
) as res_order_by_unbounded_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between unbounded preceding and current row
) as res_preceding
, sum(val) over (
partition by user_id
-- order by ts
rows between current row and unbounded following
) as res_following
, sum(val) over (
partition by user_id
order by ts
rows between unbounded preceding and unbounded following
) as res_orderby_preceding_following
from example_table;
db<>fiddle
You will see, that you can get a cumulative sum without an ORDER BY clause aswell as get a "full" sum with the ORDER BY clause.
Related Topics
Sum Columns with Null Values in Oracle
Checking for Time Range Overlap, the Watchman Problem [Sql]
Access SQL Using Top 5 Returning More Than 5 Results
SQL Server Format Decimal Places with Commas
Split a Varchar in Db2 to Retrieve a Value Inside
Flatten Adjacency List Hierarchy to a List of All Paths
Fill Null Values with Last Non-Null Amount - Oracle SQL
SQL to Transpose Row Pairs to Columns in Ms Access Database
MySQL Question - How to Handle Multiple Types of Users - One Table or Multiple
Upgrading a Varchar Column to Enum Type in Postgresql
Count Consecutive Duplicate Values in SQL
Nolock VS. Transaction Isolation Level
Multiple Yet Mutually Exclusive Foreign Keys - Is This the Way to Go
How to Self Join Recursively in SQL