Aggregate Columns With Additional (Distinct) Filters

Aggregate columns with additional (distinct) filters

The aggregate FILTER clause in Postgres 9.4 or newer is shorter and faster:

SELECT u.name
, count(*) FILTER (WHERE g.winner_id > 0) AS played
, count(*) FILTER (WHERE g.winner_id = u.id) AS won
, count(*) FILTER (WHERE g.winner_id <> u.id) AS lost
FROM games g
JOIN users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP BY u.name;
  • The manual
  • Postgres Wiki
  • Depesz blog post

In Postgres 9.3 (or any version) this is still shorter and faster than nested sub-selects or CASE expressions:

SELECT u.name
, count(g.winner_id > 0 OR NULL) AS played
, count(g.winner_id = u.id OR NULL) AS won
, count(g.winner_id <> u.id OR NULL) AS lost
FROM games g
JOIN users u ON u.id IN (g.player_1_id, g.player_2_id)
GROUP BY u.name;

Details:

  • For absolute performance, is SUM faster or COUNT?

SQL Filter two aggregate functions with different conditions

You can use conditional aggregation:

SELECT airline_name,
(AVG(CASE WHEN fl_date BETWEEN '2017-07-24' and '2017-07-31' THEN arr_delay_new END) -
AVG(CASE WHEN fl_date BETWEEN '2017-07-01' and '2017-07-23' THEN arr_delay_new END)
) as AVG_DIFF
FROM Flight_delays F JOIN
Airlines A
ON A.airline_id = F.airline_id
GROUP BY airline_name;

This assumes that arr_delay_new has a type that can be averaged. Some databases are reluctant to do averages on date/times directly.

Combine two queries to count distinct strings with different filters

Much faster and simpler with conditional aggregates using the aggregate FILTER clause:

SELECT source
, count(DISTINCT sku) FILTER (WHERE product_gap = 'yes') AS yes_gap
, count(DISTINCT sku) FILTER (WHERE product_gap = 'no') AS no_gap
FROM product_gaps
WHERE ingestion_date <= '2021-05-25'
GROUP BY source;

See:

  • Aggregate columns with additional (distinct) filters

Aside 1: DISTINCT is a key word, not a function. Don't add parentheses for the single column. distinct(sku) is short notation for DISTINCT ROW(sku). It happens to work because Postgres strips the ROW wrapper for a single column, but it's just noise.

Aside 2: product_gap should probably be boolean.

Aggregating a table by multiple different column filters

Use conditional aggregation:

select user_id, max(value) as max_value
min(case when event_type = 'click' then value end) as min_click_value
from my_table
group by user_id;

Can we use same aggregate function more than once on same table field or column using Different filter conditions?

You can use an aggregate function with a CASE:

SELECT Date1,
CC,
BU,
SUM(case when mode = '011' then Amount end) Mode011,
SUM(case when mode = '012' then Amount end) Mode012,
SUM(case when mode = '013' then Amount end) Mode013,
SUM(case when mode = '014' then Amount end) Mode014
FROM MainTable
GROUP BY CC,BU,Date1;

Or you can use the PIVOT function:

select date1, CC, BU,
[011] Mode011,
[012] Mode012,
[013] Mode013,
[014] Mode014
from
(
select date1, CC, BU, mode, amount
from maintable
) src
pivot
(
sum(amount)
for mode in ([011], [012], [013], [014])
) piv

Get conditional count and conditional DISTINCT count in a single SELECT

Use the aggregate FILTER clause. Then you can combine your count with DISTINCT:

SELECT s.logged_on::date AS login_date
, count(*) FILTER (WHERE s.device = 'mobile') AS mobile_count
, count(DISTINCT user_id) FILTER (WHERE s.device = 'web') AS web_count
FROM session_log s
JOIN standard_users su USING (user_id)
GROUP BY login_date;

See:

  • Aggregate columns with additional (distinct) filters

I also simplified your twisted formulation with LEFT JOIN and then IS NOT NULL. Boils down to a plain JOIN.

If referential integrity between session_log.user_id and standard_users.user_id is enforced with a FK constraint, and standard_users.user_id is defined UNIQUE or PK - as seems reasonable - you can drop the JOIN completely:

SELECT logged_on::date AS login_date
, count(*) FILTER (WHERE device = 'mobile') AS mobile_count
, count(DISTINCT user_id) FILTER (WHERE device = 'web') AS web_count
FROM session_log
GROUP BY 1;

Athena array aggregate and filter multiple columns on condition

You should be able to do something like this:

SELECT
uuid,
SUM(fee.price) AS total_fee,
SUM(fee.price) FILTER (WHERE fee.feetype = 'discount') AS total_discount,
ARBITRARY(fee.title) FILTER (WHERE fee.feetype = 'discount') AS discount_type
FROM …
GROUP BY uuid

(I'm assuming the data column in your example is the same as the fee column in your query).

Aggregate functions support a FILTER clause that selects the rows to include into the aggregation. This can also be achieved by e.g. SUM(IF(fee.feetype = 'discount', fee.price, 0)), which is more compact but not as elegant.

The ARBITRARY aggregate function picks an arbitrary value from the group. I don't know if that's appropriate in your case, but I assume that there will only be one discount row per group. If there are more than one you might want to use ARRAY_AGG with the DISTINCT clause (e.g. ARRAY_AGG(DISTINCT fee.title) to get the all).



Related Topics



Leave a reply



Submit