Postgres window function and group by exception
You are not, in fact, using aggregate functions. You are using window functions. That's why PostgreSQL demands sp.payout
and s.buyin
to be included in the GROUP BY
clause.
By appending an OVER
clause, the aggregate function sum()
is turned into a window function, which aggregates values per partition while keeping all rows.
You can combine window functions and aggregate functions. Aggregations are applied first. I did not understand from your description how you want to handle multiple payouts / buyins per event. As a guess, I calculate a sum of them per event. Now I can remove sp.payout
and s.buyin
from the GROUP BY
clause and get one row per player
and event
:
SELECT p.name
, e.event_id
, e.date
, sum(sum(sp.payout)) OVER w
- sum(sum(s.buyin )) OVER w AS "Profit/Loss"
FROM player p
JOIN result r ON r.player_id = p.player_id
JOIN game g ON g.game_id = r.game_id
JOIN event e ON e.event_id = g.event_id
JOIN structure s ON s.structure_id = g.structure_id
JOIN structure_payout sp ON sp.structure_id = g.structure_id
AND sp.position = r.position
WHERE p.player_id = 17
GROUP BY e.event_id
WINDOW w AS (ORDER BY e.date, e.event_id)
ORDER BY e.date, e.event_id;
In this expression: sum(sum(sp.payout)) OVER w
, the outer sum()
is a window function, the inner sum()
is an aggregate function.
Assuming p.player_id
and e.event_id
are PRIMARY KEY
in their respective tables.
I added e.event_id
to the ORDER BY
of the WINDOW
clause to arrive at a deterministic sort order. (There could be multiple events on the same date.) Also included event_id
in the result to distinguish multiple events per day.
While the query restricts to a single player (WHERE p.player_id = 17
), we don't need to add p.name
or p.player_id
to GROUP BY
and ORDER BY
. If one of the joins would multiply rows unduly, the resulting sum would be incorrect (partly or completely multiplied). Grouping by p.name
could not repair the query then.
I also removed e.date
from the GROUP BY
clause. The primary key e.event_id
covers all columns of the input row since PostgreSQL 9.1.
If you change the query to return multiple players at once, adapt:
...
WHERE p.player_id < 17 -- example - multiple players
GROUP BY p.name, p.player_id, e.date, e.event_id -- e.date and p.name redundant
WINDOW w AS (ORDER BY p.name, p.player_id, e.date, e.event_id)
ORDER BY p.name, p.player_id, e.date, e.event_id;
Unless p.name
is defined unique (?), group and order by player_id
additionally to get correct results in a deterministic sort order.
I only kept e.date
and p.name
in GROUP BY
to have identical sort order in all clauses, hoping for a performance benefit. Else, you can remove the columns there. (Similar for just e.date
in the first query.)
Window function with same result in subgroup
Throw in GROUP BY file, path
:
WITH ranked_messages AS (
SELECT path
, row_number() OVER (PARTITION BY file ORDER BY max(created) DESC) AS rating_in_section
FROM files
GROUP BY file, path
)
SELECT path
FROM ranked_messages
WHERE rating_in_section > 1
GROUP BY path
ORDER BY path DESC;
db<>fiddle here
Assuming you want to work with max(created)
, i.e. the latest timestamp per group.
Related:
- PostgreSQL - Referencing another aggregate column in a window function
- Postgres window function and group by exception
PostgreSQL - Referencing another aggregate column in a window function
I think what you are after is a window function over an aggregate function, which is totally possible since window functions are applied after the aggregation:
SELECT name
, group_name
, number1
, number2
, (number1 * number2) - SUM(number3) AS total_difference
, SUM((number1 * number2) - SUM(number3)) OVER (PARTITION BY group_name) AS grand_total
FROM t
GROUP BY name, group_name, number1, number2;
Repeat the aggregate function inside the window function. The alternative is a subquery like @Gordon posted. Note, however, that the first query in his post does not currently match the second.
Related answer with more explanation:
- Postgres window function and group by exception
PostgresQL window function over blocks of continuous IDs
In Postgres you can create a custom aggregate. Example:
create or replace function first_in_series_func(int[], int)
returns int[] language sql immutable
as $$
select case
when $1[2] is distinct from $2- 1 then array[$2, $2]
else array[$1[1], $2] end;
$$;
create or replace function first_in_series_final(int[])
returns int language sql immutable
as $$
select $1[1]
$$;
create aggregate first_in_series(int) (
sfunc = first_in_series_func,
finalfunc = first_in_series_final,
stype = int[]
);
Db<>fiddle.
Read in the docs: User-Defined Aggregates
must appear in the GROUP BY clause or be used in an aggregate function
Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY
clause[*].
To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:
SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
But you may also use window functions, which looks simpler:
SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;
The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname
level) MAX
for the country in each row, so it's up to you:
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000
The solution, arguably less elegant, to show the only (cname, wmname)
tuples matching the max value, is:
SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;
cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000
[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY
) manually in the server configuration for this feature to be supported...
Postgres reporting window function call requires an OVER clause in a query that has an OVER clause
Each window function must have its own OVER
clause. The problem with your first query is that the first_value(first_name)
window function does not have an OVER
clause.
And the problem with your second query is that you have an OVER
clause that is not preceded by a window function.
Try this
SELECT e_id,
first_value(first_name) OVER (PARTITION BY e_id),
first_value(last_name) OVER (PARTITION BY e_id)
FROM me
Custom sequence or windows function in Postgresql
I am assuming that you want the rows to be sorted by the numeric part of the Value
column. If we call t
the table, here's a query that does what you want using window functions:
SELECT "group", code, string_agg(value, '+')
FROM
(SELECT *, (row_number() OVER (PARTITION BY "group", code ORDER BY n) - 1) / CASE code WHEN 2 THEN 2 ELSE 1 END AS code_group
FROM (SELECT *,substr(value,2)::integer AS n FROM t) t1
) t2
GROUP BY "group", code, code_group
ORDER BY min(n);
The idea is to first to extract (in a subquery) the numeric part of value
so that we can use it later on as a sort key. Then we use the following complex exression:
(row_number() OVER (PARTITION BY "group", code ORDER BY n) - 1) / CASE code WHEN 2 THEN 2 ELSE 1 END
This basically assigns to each group of rows (grouped by group
, code
) an increasing number starting with 0. So the first row is 0, the next one is 1 and so on. But there is an exception to that. If code
is 2, then we use the following numbering scheme: 0, 0, 1, 1, 2, 2, .... This is accomplished by dividing the row number by 2. I call this number, the code_group
. In the last step, we group by group
, code
(like before) but also code_group
so that pairs of consecutive row with code=2
collapse into one.
Related Topics
Postgres Unique Constraint VS Index
Postgresql Multi Insert...Returning with Multiple Columns
Return Pre-Update Column Values Using SQL Only
How to Use Asp Variables in SQL Statement
Explode (Transpose) Multiple Columns in Spark SQL Table
Is It Necessary to Create Tables Each Time You Connect the Derby Database
Aggregate Function in SQL Where-Clause
How to Remove Redundant Namespace in Nested Query When Using for Xml Path
How to Reorder Rows in SQL Database
How to Get the Last Row of an Oracle Table
Oracle Query to Fetch Column Names
How to Import Text Files with the Same Name and Schema But Different Directories into Database
How to Sort a Varchar Column in SQL Server That Contains Numbers
Insert Command :: Error: Column "Value" Does Not Exist
How to Connect to SQL Server from Another Computer
What Is a Self Join For? (In English)
Merge Overlapping Date Intervals
Select Top X (Or Bottom) Percent for Numeric Values in MySQL