Aggregate columns with additional (distinct) filters
The aggregate FILTER
clause in Postgres 9.4 or newer is shorter and faster:
SELECT u.name
, count(*) FILTER (WHERE g.winner_id > 0) AS played
, count(*) FILTER (WHERE g.winner_id = u.id) AS won
, count(*) FILTER (WHERE g.winner_id <> u.id) AS lost
FROM games g
JOIN users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP BY u.name;
- The manual
- Postgres Wiki
- Depesz blog post
In Postgres 9.3 (or any version) this is still shorter and faster than nested sub-selects or CASE
expressions:
SELECT u.name
, count(g.winner_id > 0 OR NULL) AS played
, count(g.winner_id = u.id OR NULL) AS won
, count(g.winner_id <> u.id OR NULL) AS lost
FROM games g
JOIN users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP BY u.name;
Details:
- For absolute performance, is SUM faster or COUNT?
SQL Filter two aggregate functions with different conditions
You can use conditional aggregation:
SELECT airline_name,
(AVG(CASE WHEN fl_date BETWEEN '2017-07-24' and '2017-07-31' THEN arr_delay_new END) -
AVG(CASE WHEN fl_date BETWEEN '2017-07-01' and '2017-07-23' THEN arr_delay_new END)
) as AVG_DIFF
FROM Flight_delays F JOIN
Airlines A
ON A.airline_id = F.airline_id
GROUP BY airline_name;
This assumes that arr_delay_new
has a type that can be averaged. Some databases are reluctant to do averages on date/times directly.
Combine two queries to count distinct strings with different filters
Much faster and simpler with conditional aggregates using the aggregate FILTER
clause:
SELECT source
, count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
, count(DISTINCT sku) FILTER (WHERE product_gap = 'no') AS no_gap
FROM product_gaps
WHERE ingestion_date <= '2021-05-25'
GROUP BY source;
See:
- Aggregate columns with additional (distinct) filters
Aside 1: DISTINCT
is a key word, not a function. Don't add parentheses for the single column. distinct(sku)
is short notation for DISTINCT ROW(sku)
. It happens to work because Postgres strips the ROW wrapper for a single column, but it's just noise.
Aside 2: product_gap
should probably be boolean
.
Aggregating a table by multiple different column filters
Use conditional aggregation:
select user_id, max(value) as max_value
min(case when event_type = 'click' then value end) as min_click_value
from my_table
group by user_id;
Can we use same aggregate function more than once on same table field or column using Different filter conditions?
You can use an aggregate function with a CASE
:
SELECT Date1,
CC,
BU,
SUM(case when mode = '011' then Amount end) Mode011,
SUM(case when mode = '012' then Amount end) Mode012,
SUM(case when mode = '013' then Amount end) Mode013,
SUM(case when mode = '014' then Amount end) Mode014
FROM MainTable
GROUP BY CC,BU,Date1;
Or you can use the PIVOT function:
select date1, CC, BU,
[011] Mode011,
[012] Mode012,
[013] Mode013,
[014] Mode014
from
(
select date1, CC, BU, mode, amount
from maintable
) src
pivot
(
sum(amount)
for mode in ([011], [012], [013], [014])
) piv
Get conditional count and conditional DISTINCT count in a single SELECT
Use the aggregate FILTER
clause. Then you can combine your count with DISTINCT
:
SELECT s.logged_on::date AS login_date
, count(*) FILTER (WHERE s.device = 'mobile') AS mobile_count
, count(DISTINCT user_id) FILTER (WHERE s.device = 'web') AS web_count
FROM session_log s
JOIN standard_users su USING (user_id)
GROUP BY login_date;
See:
- Aggregate columns with additional (distinct) filters
I also simplified your twisted formulation with LEFT JOIN
and then IS NOT NULL
. Boils down to a plain JOIN
.
If referential integrity between session_log.user_id
and standard_users.user_id
is enforced with a FK constraint, and standard_users.user_id
is defined UNIQUE or PK - as seems reasonable - you can drop the JOIN
completely:
SELECT logged_on::date AS login_date
, count(*) FILTER (WHERE device = 'mobile') AS mobile_count
, count(DISTINCT user_id) FILTER (WHERE device = 'web') AS web_count
FROM session_log
GROUP BY 1;
Athena array aggregate and filter multiple columns on condition
You should be able to do something like this:
SELECT
uuid,
SUM(fee.price) AS total_fee,
SUM(fee.price) FILTER (WHERE fee.feetype = 'discount') AS total_discount,
ARBITRARY(fee.title) FILTER (WHERE fee.feetype = 'discount') AS discount_type
FROM …
GROUP BY uuid
(I'm assuming the data
column in your example is the same as the fee
column in your query).
Aggregate functions support a FILTER
clause that selects the rows to include into the aggregation. This can also be achieved by e.g. SUM(IF(fee.feetype = 'discount', fee.price, 0))
, which is more compact but not as elegant.
The ARBITRARY
aggregate function picks an arbitrary value from the group. I don't know if that's appropriate in your case, but I assume that there will only be one discount row per group. If there are more than one you might want to use ARRAY_AGG
with the DISTINCT
clause (e.g. ARRAY_AGG(DISTINCT fee.title)
to get the all).
Related Topics
SQL Logic Operator Precedence: and and Or
Ordering by the Order of Values in a SQL In() Clause
How to Access the "Previous Row" Value in a Select Statement
Removing Duplicate Rows from Table in Oracle
SQL Server: How to Insert into Two Tables At the Same Time
Column Calculated from Another Column
Cannot Delete or Update a Parent Row: a Foreign Key Constraint Fails
MySQL Trigger After Update Only If Row Has Changed
Refactor a Pl/Pgsql Function to Return the Output of Various Select Queries
How to Update Two Tables in One Statement in SQL Server 2005
SQL Server 2008 Management Studio Not Checking the Syntax of My Query
Condition Within Join or Where
How to Pass Parameters to Query
Group by Minimum Value in One Field While Selecting Distinct Rows