Select Multiple (Non-Aggregate Function) Columns with Group By

Select multiple (non-aggregate function) columns with GROUP BY

You have yourself a greatest-n-per-group problem. This is one of the possible solutions:

select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r

Another possible approach, same output:

select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;

There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column

How do we select non-aggregate columns in a query with a GROUP BY clause, which is not functionally dependent on columns in GROUP BY clause?

Try execute the below query. This will remove the restriction of such.

SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

Can I use non-aggregate columns with group by?

You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.

Why group by requires all non aggregate columns

if you are using Sql Server you can do this

select 
country,
city,
sum(income) over (partition by country)
from
table1

Or if you are using another database you can use a sub query

  select 
t1.country,
t1.city,
(select sum(t2.income) from table1 t2 where t1.country = t2.country)
from
table1 t1

Are there in guarantees about the non-aggregated columns in a GROUP BY query?

This behavior is covered in SELECT/Simple Select Processing/Side note: Bare columns in an aggregate queries.

In your query the columns id and letter, which are not aggregated and are not included in the GROUP BY clause, are called bare columns.

Because you use the MAX() aggregate function, the values of these 2 columns:

... take values from the input row which also contains the minimum or
maximum

But, since there may exist more than 1 rows with the maximum val for the same parent:

There is still an ambiguity if two or more of the input rows have the
same minimum or maximum value

This means that for your sample data there is no guarantee that for parent = 10 you will get the row with id = 1 in the results.

You may get the row with id = 2 which also contains the maximum val.

Assuming that in such a case, where for the same parent there may exist more than 1 rows with the maximum val, you want the row with the minimum id, you can do it with window functions:

SELECT id, val, parent, letter
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY parent ORDER BY val DESC, id) rn
FROM tablename
)
WHERE rn = 1

or:

SELECT DISTINCT
FIRST_VALUE(id) OVER (PARTITION BY parent ORDER BY val DESC, id) id,
MAX(val) OVER (PARTITION BY parent) val,
parent,
FIRST_VALUE(letter) OVER (PARTITION BY parent ORDER BY val DESC, id) letter
FROM tablename

See the demo.



Related Topics



Leave a reply



Submit