Must Appear in the Group by Clause or Be Used in an Aggregate Function

must appear in the GROUP BY clause or be used in an aggregate function

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;


cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

ERROR: column must appear in the GROUP BY clause or be used in an aggregate function when using two joins

Here is a version with the GROUP BY problem corrected:

SELECT
A.name,
A.unit,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') AS number,
SUM(CAST(A.amount AS decimal)) AS sum_amount,
COUNT(A.amount) AS cnt_amount
INTO result
FROM B
INNER JOIN A ON B.name = A.name AND B.parent = A.id
INNER JOIN C ON A.name = C.name AND B.child = C.id
GROUP BY
A.name,
A.unit,
B.child,
number;

Note that every column/alias which appears in the SELECT clause also appears in GROUP BY. Exceptions to this are columns which appear inside aggregate functions. In that case, it is OK for them to not appear in GROUP BY.

Postgres SQL: column must appear in the GROUP BY clause or be used in an aggregate function

As a general rule, any column not listed in the GROUP BY clause should show up aggregated in the SELECT list.

For example s.name should show up as max(s.name) or min(s.name) since it's not present n the GROUP BY list. However, PostgreSQL implements functional dependency (a SQL Standard feature) for the GROUP BY clause, and detects that s.name is dependent in the s.id column (that is probably a PK); in short, there's a single possible value s.name for each s.id. Therefore, there's no need in PostgreSQL to aggregate this column (you can, but it's not needed).

On the flip side, for lookupStudyType.description PostgreSQL cannot determine if it's functionally dependent on s.id or not. You'll need to aggregate it as max(lookupStudyType.description) or min(lookupStudyType.description), or any other aggregation expression.

As a side note, I have rarely seen functional dependency implemented in other databases. Isn't PostgreSQL awesome? (I'm not affiliated with PostgreSQL in any way).

Column must appear in the GROUP BY clause or be used in an aggregate function?

You need to specify the fields that will be used to aggregate (group) the results, in this case the salesperson name.

SELECT sum(t2.amount), t1.name 
FROM Salesperson t1
INNER JOIN Orders t2 ON t1.ID = t2.salesperson_id
WHERE t2.amount >= 700
GROUP BY t1.name

Postgresql Column must appear in the GROUP BY clause or be used in an aggregate function when using CASE expression inside ORDER BY clause

Your problem has a couple of roots:

Most importantly, don't use the same name for an output column that is distinct from an input column (of the same name). That's a loaded foot-gun.

Secondly, make it a habit to table-qualify all columns used in a complex query involving multiple tables. Even if that seems to work, it might already be doing something else than you think. And even if it works correctly it may break later, if any column names are changed (added, removed renamed). With some bad luck it breaks silently, and your query happily ever after produces nonsense.

Thirdly, the SQL standard, which has somewhat confusing visibility rules. See:

  • GROUP BY + CASE statement

In your working alternative query, "value" resolves to the output column "value", which hides any input column of the same name in ORDER BY. That works as expected (that is, if you actually meant to target the output column).

In your failing query, "value" resolves to the input column "measurementResults.value". You cannot throw output columns into a new computation in ORDER BY, you can only use them "as is". So, with output columns out of the way, "value" resolves to the input column (now not hidden any more). And that leads to the reported error. Obviously, you cannot order by an input column after aggregating - except if you grouped by it, directly or indirectly.

You could repair your query with:

ORDER  BY (ranking = 'greater') IS TRUE, "value" DESC

The sorts all rows where ranking = 'greater' is not true to the top - like your CASE expression would. So treating null and false alike.

Subtle difference: Those leading rows are sorted by value, while your original would list them in arbitrary order. May or may not be welcome.

  • Sorting null values after all others, except special
  • Best way to check for "empty or null value"

I assume you are aware that null values sort on top in descending order? And that you can change that? See:

  • Sort by column ASC, but NULL values first?

If that's not good enough (or for more complex expressions), you must be more verbose and explicit: one way is to wrap the whole query into a subquery, and order (and limit!) in the outer SELECT:

SELECT avg_value, min_timestamp, min_ranking
FROM (
SELECT ir.ranking -- !
, avg(mr."value") AS avg_value -- !
, min(m."timestamp") AS min_timestamp -- !
, min(ir.ranking) AS min_ranking -- !
FROM measurement m
JOIN "measurementResults" mr ON mr.measurement = m.id
JOIN conditions c ON c.measurement = m.id
JOIN "testProtocolItemResults" ir ON ir.id = mr."testProtocolItemResults"
JOIN "testProtocolSessionItem" si ON si.id = m."testProtocolSessionItem"
WHERE m."athlete" = 334
AND mr."testProtocolItemResults" = 1
AND c."conditions" = '6'
GROUP BY si."testProtocolSession", ir.ranking
) sub
ORDER BY CASE WHEN ranking = 'greater' THEN "value" END DESC
LIMIT 3

Especially for queries with a small LIMIT, this may be more expensive if Postgres cannot optimize the query plan as well any more.

Aside:

Use legal, loser-case identifiers, so you don't have to double-quote.

And use table aliases to de-noise your big queries.



Related Topics



Leave a reply



Submit