Groupingerror: Error: Column " " Must Appear in the Group by Clause or Be Used in an Aggregate Function

GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

You are not allowed to select reviews.id (selected implicitly through the wildcard *) without adding it to the GROUP BY clause or applying an aggregate function like avg(). The solution is to do one of the following:

  1. Remove the wildcard * from your select
  2. Add the field reviews.id to your group clause
  3. Select reviews.id explicitly and apply an aggregate function to it (e.g. sum(reviews.id))
  4. Replace the wildcard * with the table-specific wildcard albums.*

The second and third option do not make much sense in your scenario though.
Based on your comment, I added option four.

PG::GroupingError: ERROR: column events.id must appear in the GROUP BY clause or be used in an aggregate function

Event
.order(:popularity)
.joins(:keywords)
.group('events.id') # <======
.where(keywords: { category: 'taxonomy' })
.group('keywords.name')

Postgres error [column must appear in the GROUP BY clause or be used in an aggregate function]

This is just the first of four errors that you will get. PostreSQL stops checking your SQL once it hits an error, so it does not mean that there is only one. In fact you have the same problem with all of id, created_at, updated_at and version. As the error tells if you are using GROUP BY then all the columns in the SELECT statement must either be in the GROUP BY clause, or need to have some sort of aggregate function used on them. Assuming you do not want to add these to the GROUP BY (on the grounds that nothing will then presumably have a COUNT > 10), then you either have to drop them from the SELECT or apply some aggregate function. In your case MAX might be suitable, but without knowing more I cannot really tell.

ERROR: column must appear in the GROUP BY clause or be used in an aggregate function when using two joins

Here is a version with the GROUP BY problem corrected:

SELECT
A.name,
A.unit,
B.child,
REGEXP_MATCHES(A.b_number, '([^.]*--[0-9]*).*') AS number,
SUM(CAST(A.amount AS decimal)) AS sum_amount,
COUNT(A.amount) AS cnt_amount
INTO result
FROM B
INNER JOIN A ON B.name = A.name AND B.parent = A.id
INNER JOIN C ON A.name = C.name AND B.child = C.id
GROUP BY
A.name,
A.unit,
B.child,
number;

Note that every column/alias which appears in the SELECT clause also appears in GROUP BY. Exceptions to this are columns which appear inside aggregate functions. In that case, it is OK for them to not appear in GROUP BY.

Postgres SQL: column must appear in the GROUP BY clause or be used in an aggregate function

As a general rule, any column not listed in the GROUP BY clause should show up aggregated in the SELECT list.

For example s.name should show up as max(s.name) or min(s.name) since it's not present n the GROUP BY list. However, PostgreSQL implements functional dependency (a SQL Standard feature) for the GROUP BY clause, and detects that s.name is dependent in the s.id column (that is probably a PK); in short, there's a single possible value s.name for each s.id. Therefore, there's no need in PostgreSQL to aggregate this column (you can, but it's not needed).

On the flip side, for lookupStudyType.description PostgreSQL cannot determine if it's functionally dependent on s.id or not. You'll need to aggregate it as max(lookupStudyType.description) or min(lookupStudyType.description), or any other aggregation expression.

As a side note, I have rarely seen functional dependency implemented in other databases. Isn't PostgreSQL awesome? (I'm not affiliated with PostgreSQL in any way).

must appear in the GROUP BY clause or be used in an aggregate function

Yes, this is a common aggregation problem. Before SQL3 (1999), the selected fields must appear in the GROUP BY clause[*].

To workaround this issue, you must calculate the aggregate in a sub-query and then join it with itself to get the additional columns you'd need to show:

SELECT m.cname, m.wmname, t.mx
FROM (
SELECT cname, MAX(avg) AS mx
FROM makerar
GROUP BY cname
) t JOIN makerar m ON m.cname = t.cname AND t.mx = m.avg
;

cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

But you may also use window functions, which looks simpler:

SELECT cname, wmname, MAX(avg) OVER (PARTITION BY cname) AS mx
FROM makerar
;

The only thing with this method is that it will show all records (window functions do not group). But it will show the correct (i.e. maxed at cname level) MAX for the country in each row, so it's up to you:

 cname  | wmname |          mx           
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | luffy | 5.0000000000000000
spain | usopp | 5.0000000000000000

The solution, arguably less elegant, to show the only (cname, wmname) tuples matching the max value, is:

SELECT DISTINCT /* distinct here matters, because maybe there are various tuples for the same max value */
m.cname, m.wmname, t.avg AS mx
FROM (
SELECT cname, wmname, avg, ROW_NUMBER() OVER (PARTITION BY avg DESC) AS rn
FROM makerar
) t JOIN makerar m ON m.cname = t.cname AND m.wmname = t.wmname AND t.rn = 1
;


cname | wmname | mx
--------+--------+------------------------
canada | zoro | 2.0000000000000000
spain | usopp | 5.0000000000000000

[*]: Interestingly enough, even though the spec sort of allows to select non-grouped fields, major engines seem to not really like it. Oracle and SQLServer just don't allow this at all. Mysql used to allow it by default, but now since 5.7 the administrator needs to enable this option (ONLY_FULL_GROUP_BY) manually in the server configuration for this feature to be supported...

Rails PG GroupingError column must appear in the GROUP BY clause

I don't know why you want to select likes.id in the first place. I see that you basically want the like_count for each Idea; I don't see the point in selecting likes.id. Also, when you already have the ideas.id, I don't see why you would want to get the value of likes.likeable_id since they'll both be equal. :/

Anyway, the problem is since you're grouping by likeable_id (basically ideas.id), you can't "select" likes.id since they would be "lost" by the grouping.

I suppose SQLite is lax about this. I imagine it wouldn't group things properly.

ANYWAY(2) =>

Let me propose a cleaner solution.

# model
class Idea < ActiveRecord::Base
# to save you the effort of specifying the join-conditions
has_many :likes, foreign_key: :likeable_id
end

# in your code elsewhere
ideas = \
Idea.
joins(:likes).
group("ideas.id").
select("COUNT(likes.id) AS like_count, ideas.id, ideas.title, ideas.intro").
order("like_count DESC")

If you still want to get the IDs of likes for each item, then after the above, here's what you could do:

grouped_like_ids = \
Like.
select(:id, :likeable_id).
each_with_object({}) do |like, hash|
(hash[like.likeable_id] ||= []) << like.id
end

ideas.each do |idea|
# selected previously:
idea.like_count
idea.id
idea.title
idea.intro

# from the hash
like_ids = grouped_like_ids[idea.id] || []
end

Other readers: I'd be very interested in a "clean" one-query non-sub-query solution. Let me know in the comments if you leave a response. Thanks.

GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

You cannot combine SELECT * with GROUP BY some_column in Postgres because that's a contradiction (unless it selects from a single table and some_column is its PK). All non-aggregated columns (used in the SELECT, HAVING or ORDER BY clause outside an aggregate function) must be in the GROUP BY list - where the primary key column(s) cover(s) all columns of a table. Else it would be undefined which value to pick from the aggregated set.

The manual:

When GROUP BY is present, or any aggregate functions are present, it
is not valid for the SELECT list expressions to refer to ungrouped
columns except within aggregate functions or when the ungrouped column
is functionally dependent on the grouped columns, since there would
otherwise be more than one possible value to return for an ungrouped
column. A functional dependency exists if the grouped columns (or a
subset thereof) are the primary key of the table containing the
ungrouped column.

A certain other RDBMS is known to play dirty tricks here and allow this and pick arbitrary values...

You seem to want a list of unique patients that have commented, with the latest comment each. The simplest way in Postgres is with DISTINCT ON:

SELECT DISTINCT ON (patient_id) *
FROM comments
WHERE clinician_id = $1
ORDER BY patient_id, created_at DESC NULLS LAST;

But this won't fly with SQLite - which should not be in the loop to begin with. See:

  • Generic Ruby solution for SQLite3 "LIKE" or PostgreSQL "ILIKE"?

NULLS LAST is only relevant if created_at can be NULL:

  • Sort by column ASC, but NULL values first?

Details for DISTINCT ON:

  • Select first row in each GROUP BY group?


Related Topics



Leave a reply



Submit