How to Use Non-Aggregate Columns with Group By

Can I use non-aggregate columns with group by?

You can't get the Id of the row that MAX found, because there might not be only one id with the maximum age.

GROUP BY clause with non aggregate functions

Sometimes is quite acceptable. Your query, written in more standard SQL, would be something like:

SELECT col, CHAR_LENGTH(col)
FROM (SELECT col FROM table GROUP BY col) c

or as:

SELECT col, MAX(CHAR_LENGTH(col))
FROM table
GROUP BY col

using non aggregate functions you can simplify the query a little bit, but the query would be a little more difficult to read.

It could also be useful when you are sure that all non aggregated columns share the same value:

SELECT id, name, surname
FROM table
GROUP BY id
HAVING COUNT(*)=1

or when it doesn't matter which value you need to return:

SELECT id, name
FROM table
GROUP BY id

will return a single name associated to that id (probably the first name encountered, but we can't be sure which one is the first, order by doesn't help here...). Be warned that if you want to select multiple non aggregated columns:

SELECT id, name, surname
FROM table
GROUP BY id

we have no guarantees that the name and surname returned will belong to the same row.

I would prefer not to use this extension, unless you are 100% sure of why you are using it.

Using GROUP BY and JOIN-ing non-aggregated columns to query

Maybe by aggregating the extra columns?

SELECT foo, bar, baz, moo
, MAX(ins_date) AS mdate
, AVG(percentage) AS avg_perc
, MAX(yes) AS YEAHBABY
, MAX(maybe) AS MAYBEBABY
, MAX(no) AS NONONONONOOO
FROM dummy dum
WHERE ins_date BETWEEN '2018-07-01 00:00:00'
AND '2019-11-01 23:59:59'
AND moo LIKE '%moo%'
AND baz = 'baz'
GROUP BY foo, bar, baz, moo

foo | bar | baz | moo | mdate | avg_perc | YEAHBABY | MAYBEBABY | NONONONONOOO
:------- | :------ | :-- | :---------------- | :------------------ | -------: | -------: | --------: | -----------:
foothing | bar_one | baz | amoosing | 2018-11-29 05:31:00 | 82.8000 | 11236 | 625 | 841
foothing | bar_one | baz | mooman_being | 2019-04-21 10:31:00 | 70.0000 | 3969 | 16 | 121
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 | 80.0000 | 23716 | 529 | 49

db<>fiddle here

Or join to the grouped fields and MAX date.

Or use an emulated row_number.

Or use an EXISTS.

SELECT foo, bar, baz, moo
, ins_date
, percentage
, yes, maybe, no
FROM dummy dum
WHERE EXISTS (
SELECT 1
FROM dummy dum2
WHERE dum2.ins_date BETWEEN '2018-07-01 00:00:00'
AND '2019-11-01 23:59:59'
AND dum2.moo LIKE '%moo%'
AND dum2.baz = 'baz'
AND dum2.foo = dum.foo
AND dum2.bar = dum.bar
AND dum2.baz = dum.baz
AND dum2.moo = dum.moo
GROUP BY foo, bar, baz, moo
HAVING MAX(dum2.ins_date) = dum.ins_date
);

foo | bar | baz | moo | ins_date | percentage | yes | maybe | no
:------- | :------ | :-- | :---------------- | :------------------ | ---------: | ----: | ----: | --:
foothing | bar_one | baz | strawberry_moosse | 2019-03-17 06:37:00 | 80 | 23716 | 529 | 49
foothing | bar_one | baz | mooman_being | 2019-04-21 10:31:00 | 70 | 3969 | 16 | 121
foothing | bar_one | baz | amoosing | 2018-11-29 05:31:00 | 97 | 9025 | 361 | 1

db<>fiddle here

Are there in guarantees about the non-aggregated columns in a GROUP BY query?

This behavior is covered in SELECT/Simple Select Processing/Side note: Bare columns in an aggregate queries.

In your query the columns id and letter, which are not aggregated and are not included in the GROUP BY clause, are called bare columns.

Because you use the MAX() aggregate function, the values of these 2 columns:

... take values from the input row which also contains the minimum or
maximum

But, since there may exist more than 1 rows with the maximum val for the same parent:

There is still an ambiguity if two or more of the input rows have the
same minimum or maximum value

This means that for your sample data there is no guarantee that for parent = 10 you will get the row with id = 1 in the results.

You may get the row with id = 2 which also contains the maximum val.

Assuming that in such a case, where for the same parent there may exist more than 1 rows with the maximum val, you want the row with the minimum id, you can do it with window functions:

SELECT id, val, parent, letter
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY parent ORDER BY val DESC, id) rn
FROM tablename
)
WHERE rn = 1

or:

SELECT DISTINCT
FIRST_VALUE(id) OVER (PARTITION BY parent ORDER BY val DESC, id) id,
MAX(val) OVER (PARTITION BY parent) val,
parent,
FIRST_VALUE(letter) OVER (PARTITION BY parent ORDER BY val DESC, id) letter
FROM tablename

See the demo.

How do we select non-aggregate columns in a query with a GROUP BY clause, which is not functionally dependent on columns in GROUP BY clause?

Try execute the below query. This will remove the restriction of such.

SET GLOBAL sql_mode=(SELECT REPLACE(@@sql_mode,'ONLY_FULL_GROUP_BY',''));

Select multiple (non-aggregate function) columns with GROUP BY

You have yourself a greatest-n-per-group problem. This is one of the possible solutions:

select c.mukey, c.comppct_r, c.name, c.type
from c yt
inner join(
select c.mukey, max(c.comppct_r) comppct_r
from c
group by c.mukey
) ss on c.mukey = ss.mukey and c.comppct_r= ss.comppct_r

Another possible approach, same output:

select c1.*
from c c1
left outer join c c2
on (c1.mukey = c2.mukey and c1.comppct_r < c2.comppct_r)
where c2.mukey is null;

There's a comprehensive and explanatory answer on the topic here: SQL Select only rows with Max Value on a Column

SQL Server - How to have non-aggregated columns in Select statement when using group by?

Figured this out literally as soon as I posted it. I was just confused on the proper way to use a window function.

SELECT [Id]
,LastModifiedDate
,Row_Number() over (Partition by Id order by LastModifiedDate DESC) as rn
,[FirstName]
,[LastName]
,[Company]
,[Title]
FROM [MyServer].[Database].[Table] as C
where rn = 1

What i was trying to do was:

,Row_Number() over (Partition by Id order by MAX(LastModifiedDate) DESC) as rn

but MAX makes no sense because it is DESC



Related Topics



Leave a reply



Submit