Nested aggregate functions, Max(Avg()), in SQL
If you have something like this
EmployeeId DepartmentId Salary
1 1 10
2 1 30
3 2 30
4 2 40
5 2 20
6 3 40
7 3 50
after grouping
DepartmentId AVG(Salary)
1 (10+30)/2 = 20
2 (30+40+20)/3 = 30
3 (40+50)/2= 45
So the query below will return 45 as Maximum average salary for departmentId 3
SELECT MAX(x.avg)
FROM ( SELECT AVG(salary)as avg FROM employees group by department_id)x;
When can aggregate functions be nested in standard SQL?
You already noticed the difference yourself: It's all about the window. COUNT(*)
without an OVER
clause for instance is an aggregation function. COUNT(*)
with an OVER
clause is a window function.
By using aggregation functions you condense the original rows you get after the FROM
clause and WHERE
clause are applied to either the specified group in GROUP BY
or to one row in the absence of a GROUP BY
clause.
Window functions, aka analytic functions, are applied afterwards. They don't change the number of result rows, but merely add information by looking at all or some rows (the window) of the selected data.
In
SELECT
options.id,
options.option_text,
COUNT(votes.option_id) as vote_count,
COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;
we first join votes to options and then count the votes per option by aggregating the joined rows down to one result row per option (GROUP BY options.id
). We count on a non-nullable column in the votes table (COUNT(votes.option_id)
, so we get a zero count in case there are no votes, because in an outer joined row this column is set to null.
After aggregating all rows and getting thus one row per option we apply a window function (SUM() OVER
) on this result set. We apply the analytic SUM
on the vote count (SUM(COUNT(votes.option_id))
by looking at the whole result set (empty OVER
clause), thus getting the same total vote count in every row. We use this value for a calculation: option's vote count diveded by total vote count times 100, which is the option's percentage of total votes.
The PostgreSQL query is very similar. We select the number of posts per date (COUNT(created_at)
is nothing else than a mere COUNT(*)
) along with a running total of these counts (by using a window that looks at all rows up to the current row).
So, while this looks like we are nesting two aggregate functions, this is not really the case, because SUM OVER
is not considered an agregation function but an analytic/window function.
Oracle does allow applying an aggregate function directly on another, thus invoking a final aggregation on a previous grouped by aggregation. This allows us to get one result row of, say, the average of sums without having to write a subquery for this. This is not compliant with the SQL standard, however, and very unpopular even among Oracle developers at that.
How to resolve nested aggregate function error?
looking to have each bucket shown as a % of the total
Combining COUNT(...) with windowed SUM() OVER() to get the total across all groups:
SELECT Case_Age_Category,
DIV0(COUNT(Case_Age_Category), SUM(COUNT(Case_Age_Category)) OVER()) as Volume
FROM
(
-- ...
) sub
WHERE CLOSED_AT >= '2021-01-01 00:00:00.000'
GROUP BY Case_Age_Category
ORDER BY Case_Age_Category ASC
db<>fiddle demo
Nested aggregate function in SELECT list
since you have not define any columns to be grouped, The value of SUM()
is equal to MAX()
UPDATE
An error was thrown because MAX(SUM(e.Empid))
requires the results of two grouped selects, not just one.
Why GROUP BY is needed for a nested aggregate function in Oracle
Well the error message you get says it ORA-00978: nested group function without GROUP BY
If you check the documentation there is not an explicit notion of this limitation, but carefully reading the description of the functionality you should realize, that the Group by
clause is required for the usage of the nested aggregate functions.
You can nest aggregate functions ...
This calculation evaluates the inner aggregate (MAX(salary)) for each group defined by the GROUP BY clause ...,
and aggregates the results again.
So you have two workarounds to simulate the nested aggregation without GROUP BY
A) Add constant GROUP BY
select max(count(*)) from tab group by 42;
Note that you must use NLV
if you require a zero result on the empty table (same as in case B)
select nvl(max(count(*)),0) max_cnt from tab group by 42
B) Split in two Subqueries
with tab2 as (
select count(*) cnt from tab)
select max(cnt) max_cnt
from tab2
Nested aggregate function in PostgreSQL
In your question so many things are unclear. Based on what i understood from your current query, try this:
with cte as (
SELECT ag.article_group_id,
ag.article_group,
au.author_id,
au.author_name,
avg(gr.total) as avg_total
FROM article_group ag
LEFT JOIN article ar on ar.article_group_id=ag.article_group_id
LEFT JOIN article_to_author ata ON ar.article_id = ata.article_id
LEFT JOIN author au ON ata.author_id = au.author_id
LEFT JOIN grade gr ON ar.article_id = gr.article_id
GROUP BY ag.article_group_id, ag.article_group, au.author_id, au.author_name
)
SELECT article_group_id,
article_group,
array_agg('[' || author_id || ',' || author_name || ',' || avg_total || ']')
FROM cte
GROUP BY article_group_id, article_group
You can change whatever you want in array_agg
DEMO
Nested Aggregate function in having clause MySql
As you onl y wanted the ids, you don't need to join the customers table
SELECT customers_customer_id, COUNT(*) count_r
FROM orders
GROUP BY customers_customer_id
HAVING count_r = (
SELECT MAX(count_r) FROM (SELECT COUNT(*) count_r FROM `orders` GROUP BY customers_customer_id) t1)
customers_customer_id | count_r
--------------------: | ------:
1 | 2
2 | 2
3 | 2
db<>fiddle here
to select all wanted columns from Orders and to only include only ids , what have the highest number of orders, I have to do two nested Queries to get the highest numerb of orders
.
- Select the numbers of orders for ever customers_customer_id
- As i need only the highest count, i select the MAX from all count i gathered in the first Select
- i use this to filter out all customers_customer_ids which has the highest nu7mer of orders.
The HAVING
is needed, because i have to run the Max numbers against the columns count_r, which i can not do on a WHERE clause.
How to combine aggregate functions in MySQL?
You have to use subqueries:
SELECT x.user,
AVG(x.cnt)
FROM (SELECT user, COUNT(answer) AS cnt
FROM surveyValues
WHERE study='a1'
GROUP BY user) x
GROUP BY x.user
You can't wrap an aggregate with another aggregate. You could wrap an analytic in an aggregate, if MySQL supported analytic/ranking/windowing functions...
MySQL Nested aggregation query with selection of specific intermediate items
You can use SUM()
window function to assign a number to each session and then aggregate:
SELECT DISTINCT user_id,
SUM(TIMESTAMPDIFF(MINUTE, MIN(log_time), MAX(log_time))) OVER (PARTITION BY user_id) total_time
FROM (
SELECT *, SUM(activity = 'start') OVER (PARTITION BY user_id, DATE(log_time) ORDER BY log_time) grp
FROM activities
) t
WHERE grp > 0
GROUP BY user_id, DATE(log_time), grp;
See the demo.
Related Topics
Extract Email Address from String Using Tsql
Sql Best Practices - Ok to Rely on Auto Increment Field to Sort Rows Chronologically
Display Multiple Rows and Column Values into a Single Row, Multiple Column Values
In SQL Server, How to Convert Binary Strings to Binary
Does SQL Server Support Is Distinct from Clause
There Is Already an Object Named '##Temp' in The Database
Use Soundex() Word by Word on SQL Server
How to Find The Documentation for The Particular Kind of SQL Used by The Jet 4.0 Engine
Sql Server String to Varbinary Conversion
Sql: Insert Into...Values..Select
How to Deal with Spark Udf Input/Output of Primitive Nullable Type
Aggregation with Group by Date in Spark Sql
Retrieve The Most Recent Record for Each Customer
Sql Order by on Multiple Column
Sql Server Equivalent of Postgresql Distinct on ()
Check That a List Parameter Is Null in a Spring Data JPA Query