Nesting Aggregate Functions - Sql

Nested aggregate functions, Max(Avg()), in SQL

If you have something like this

EmployeeId DepartmentId Salary
1 1 10
2 1 30
3 2 30
4 2 40
5 2 20
6 3 40
7 3 50

after grouping

DepartmentId    AVG(Salary) 
1 (10+30)/2 = 20
2 (30+40+20)/3 = 30
3 (40+50)/2= 45

So the query below will return 45 as Maximum average salary for departmentId 3

SELECT MAX(x.avg) 
FROM ( SELECT AVG(salary)as avg FROM employees group by department_id)x;

When can aggregate functions be nested in standard SQL?

You already noticed the difference yourself: It's all about the window. COUNT(*) without an OVER clause for instance is an aggregation function. COUNT(*) with an OVER clause is a window function.

By using aggregation functions you condense the original rows you get after the FROM clause and WHERE clause are applied to either the specified group in GROUP BY or to one row in the absence of a GROUP BY clause.

Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.

In

SELECT
options.id,
options.option_text,
COUNT(votes.option_id) as vote_count,
COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;

we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id). We count on a non-nullable column in the votes table (COUNT(votes.option_id), so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.

After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER) on this result set. We apply the analytic SUM on the vote count (SUM(COUNT(votes.option_id)) by looking at the whole result set (empty OVER clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.

The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at) is nothing else than a mere COUNT(*)) along with a running total of these counts (by using a window that looks at all rows up to the current row).

So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER is not considered an agregation function but an analytic/window function.

Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.

How to resolve nested aggregate function error?

looking to have each bucket shown as a % of the total

Combining COUNT(...) with windowed SUM() OVER() to get the total across all groups:

SELECT Case_Age_Category, 
DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
-- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC

db<>fiddle demo

Nested aggregate function in SELECT list

since you have not define any columns to be grouped, The value of SUM() is equal to MAX()

UPDATE

An error was thrown because MAX(SUM(e.Empid)) requires the results of two grouped selects, not just one.

Why GROUP BY is needed for a nested aggregate function in Oracle

Well the error message you get says it ORA-00978: nested group function without GROUP BY

If you check the documentation there is not an explicit notion of this limitation, but carefully reading the description of the functionality you should realize, that the Group byclause is required for the usage of the nested aggregate functions.

You can nest aggregate functions ...
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause ...,
and aggregates the results again.

So you have two workarounds to simulate the nested aggregation without GROUP BY

A) Add constant GROUP BY

select max(count(*)) from tab group by 42;

Note that you must use NLV if you require a zero result on the empty table (same as in case B)

select nvl(max(count(*)),0) max_cnt from tab group by 42

B) Split in two Subqueries

with tab2 as (
select count(*) cnt from tab)
select max(cnt) max_cnt
from tab2

Nested aggregate function in PostgreSQL

In your question so many things are unclear. Based on what i understood from your current query, try this:

with cte as (
SELECT ag.article_group_id,
ag.article_group,
au.author_id,
au.author_name,
avg(gr.total) as avg_total
FROM article_group ag
LEFT JOIN article ar on ar.article_group_id=ag.article_group_id
LEFT JOIN article_to_author ata ON ar.article_id = ata.article_id
LEFT JOIN author au ON ata.author_id = au.author_id
LEFT JOIN grade gr ON ar.article_id = gr.article_id
GROUP BY ag.article_group_id, ag.article_group, au.author_id, au.author_name
)
SELECT article_group_id,
article_group,
array_agg('[' || author_id || ',' || author_name || ',' || avg_total || ']')
FROM cte
GROUP BY article_group_id, article_group

You can change whatever you want in array_agg

DEMO

Nested Aggregate function in having clause MySql

As you onl y wanted the ids, you don't need to join the customers table

SELECT customers_customer_id, COUNT(*) count_r 
FROM orders
GROUP BY customers_customer_id
HAVING count_r = (
SELECT MAX(count_r) FROM (SELECT COUNT(*) count_r FROM `orders` GROUP BY customers_customer_id) t1)

customers_customer_id | count_r
--------------------: | ------:
1 | 2
2 | 2
3 | 2

db<>fiddle here

to select all wanted columns from Orders and to only include only ids , what have the highest number of orders, I have to do two nested Queries to get the highest numerb of orders
.

  1. Select the numbers of orders for ever customers_customer_id
  2. As i need only the highest count, i select the MAX from all count i gathered in the first Select
  3. i use this to filter out all customers_customer_ids which has the highest nu7mer of orders.

The HAVING is needed, because i have to run the Max numbers against the columns count_r, which i can not do on a WHERE clause.

How to combine aggregate functions in MySQL?

You have to use subqueries:

  SELECT x.user, 
AVG(x.cnt)
FROM (SELECT user, COUNT(answer) AS cnt
FROM surveyValues
WHERE study='a1'
GROUP BY user) x
GROUP BY x.user

You can't wrap an aggregate with another aggregate. You could wrap an analytic in an aggregate, if MySQL supported analytic/ranking/windowing functions...

MySQL Nested aggregation query with selection of specific intermediate items

You can use SUM() window function to assign a number to each session and then aggregate:

SELECT DISTINCT user_id,
SUM(TIMESTAMPDIFF(MINUTE, MIN(log_time), MAX(log_time))) OVER (PARTITION BY user_id) total_time
FROM (
SELECT *, SUM(activity = 'start') OVER (PARTITION BY user_id, DATE(log_time) ORDER BY log_time) grp
FROM activities
) t
WHERE grp > 0
GROUP BY user_id, DATE(log_time), grp;

See the demo.



Related Topics



Leave a reply



Submit