Rails/Postgres: "Must Appear in the Group by Clause or Be Used in an Aggregate Function"

GroupingError: ERROR: column must appear in the GROUP BY clause or be used in an aggregate function

You are not allowed to select reviews.id (selected implicitly through the wildcard *) without adding it to the GROUP BY clause or applying an aggregate function like avg(). The solution is to do one of the following:

Remove the wildcard * from your select
Add the field reviews.id to your group clause
Select reviews.id explicitly and apply an aggregate function to it (e.g. sum(reviews.id))
Replace the wildcard * with the table-specific wildcard albums.*

The second and third option do not make much sense in your scenario though.
Based on your comment, I added option four.

Rails PG GroupingError column must appear in the GROUP BY clause

I don't know why you want to select likes.id in the first place. I see that you basically want the like_count for each Idea; I don't see the point in selecting likes.id. Also, when you already have the ideas.id, I don't see why you would want to get the value of likes.likeable_id since they'll both be equal. :/

Anyway, the problem is since you're grouping by likeable_id (basically ideas.id), you can't "select" likes.id since they would be "lost" by the grouping.

I suppose SQLite is lax about this. I imagine it wouldn't group things properly.

ANYWAY(2) =>

Let me propose a cleaner solution.

# model
class Idea < ActiveRecord::Base
  # to save you the effort of specifying the join-conditions
  has_many :likes, foreign_key: :likeable_id
end

# in your code elsewhere
ideas = \
  Idea.
  joins(:likes).
  group("ideas.id").
  select("COUNT(likes.id) AS like_count, ideas.id, ideas.title, ideas.intro").
  order("like_count DESC")

If you still want to get the IDs of likes for each item, then after the above, here's what you could do:

grouped_like_ids = \
  Like.
  select(:id, :likeable_id).
  each_with_object({}) do |like, hash|
    (hash[like.likeable_id] ||= []) << like.id
  end

ideas.each do |idea|
  # selected previously:
  idea.like_count
  idea.id
  idea.title
  idea.intro

  # from the hash
  like_ids = grouped_like_ids[idea.id] || []
end

Other readers: I'd be very interested in a "clean" one-query non-sub-query solution. Let me know in the comments if you leave a response. Thanks.

PostgreSQL -must appear in the GROUP BY clause or be used in an aggregate function

@myestate1 = Estate.where(:Mgmt => current_user.Company)
@myestate = @myestate1.select("DISTINCT(user_id)")

this is what I did.

column likes.id must appear in the GROUP BY clause or be used in an aggregate function

What the error is telling you, is that you're selecting for "likes.id", but there is no single likes.id that the database can give you.

This is because by default when you don't tell Rails what you need, Rails will select EVERYTHING.

Think, do you actually need "likes.id"? Looks to me that you're grouping by exercise_id so what you're trying to get is exercises and their like counts. (correct me if I'm wrong.) You don't actually need specific like id-s.

If that's so, we need to tell rails about our intention.

Like.group(:exercise_id).order('COUNT(exercise_id) DESC').select(:exercise_id)

If you also need the count itself, just add it to the select.

Something else you might want to try is just .count. It's pretty smart and will respect your grouping. See if this helps.

Like.group(:exercise_id).count

Rails: must appear in the GROUP BY clause or be used in an aggregate function

You are grouping on date(created_at) but not selecting that column. Change this line:

orders = orders.select("created_at, sum(amount) as total_amount")

to this:

orders = orders.select("date(created_at), sum(amount) as total_amount")

That will also result in a change to the next line's group_by as well.

From one of my projects, using slightly different attributes, but doing the same thing as you:

1.9.3p327 > User.group('date(created_at)').select('date(created_at), sum(id) as total_amount').first.attributes
  User Load (1.2ms)  SELECT date(created_at), sum(id) as total_amount FROM "users" GROUP BY date(created_at) LIMIT 1
 => {"date"=>"2011-09-27", "total_amount"=>"657"}

Postgres: column must appear in the GROUP BY clause or be used in an aggregate function

As the error implies, you need to add the GROUP BY clause after FROM:
So the query should look like:

select 
'2019-09-11' as snapshot_date, 
SUM(case when snapshot_date = '2019-09-11' then balance end) as opening_balance,
SUM(case when snapshot_date = '2019-09-09' then balance end) as closing_balance,
year 
from snapshot
group by year

See: https://www.javatpoint.com/postgresql-group-by-clause

Postgresql Column must appear in the GROUP BY clause or be used in an aggregate function when using CASE expression inside ORDER BY clause

Your problem has a couple of roots:

Most importantly, don't use the same name for an output column that is distinct from an input column (of the same name). That's a loaded foot-gun.

Secondly, make it a habit to table-qualify all columns used in a complex query involving multiple tables. Even if that seems to work, it might already be doing something else than you think. And even if it works correctly it may break later, if any column names are changed (added, removed renamed). With some bad luck it breaks silently, and your query happily ever after produces nonsense.

Thirdly, the SQL standard, which has somewhat confusing visibility rules. See:

GROUP BY + CASE statement

In your working alternative query, "value" resolves to the output column "value", which hides any input column of the same name in ORDER BY. That works as expected (that is, if you actually meant to target the output column).

In your failing query, "value" resolves to the input column "measurementResults.value". You cannot throw output columns into a new computation in ORDER BY, you can only use them "as is". So, with output columns out of the way, "value" resolves to the input column (now not hidden any more). And that leads to the reported error. Obviously, you cannot order by an input column after aggregating - except if you grouped by it, directly or indirectly.

You could repair your query with:

ORDER  BY (ranking = 'greater') IS TRUE, "value" DESC

The sorts all rows where ranking = 'greater' is not true to the top - like your CASE expression would. So treating null and false alike.

Subtle difference: Those leading rows are sorted by value, while your original would list them in arbitrary order. May or may not be welcome.

Sorting null values after all others, except special
Best way to check for "empty or null value"

I assume you are aware that null values sort on top in descending order? And that you can change that? See:

Sort by column ASC, but NULL values first?

If that's not good enough (or for more complex expressions), you must be more verbose and explicit: one way is to wrap the whole query into a subquery, and order (and limit!) in the outer SELECT:

SELECT avg_value, min_timestamp, min_ranking
FROM  (
   SELECT ir.ranking                          -- !
        , avg(mr."value")    AS avg_value     -- !
        , min(m."timestamp") AS min_timestamp -- !
        , min(ir.ranking)    AS min_ranking   -- !
   FROM   measurement               m
   JOIN   "measurementResults"      mr ON mr.measurement = m.id
   JOIN   conditions                c  ON c.measurement = m.id
   JOIN   "testProtocolItemResults" ir ON ir.id = mr."testProtocolItemResults"
   JOIN   "testProtocolSessionItem" si ON si.id = m."testProtocolSessionItem"
   WHERE  m."athlete" = 334
   AND    mr."testProtocolItemResults" = 1
   AND    c."conditions" = '6'
   GROUP  BY si."testProtocolSession", ir.ranking
   ) sub
ORDER  BY CASE WHEN ranking = 'greater' THEN "value" END DESC
LIMIT  3

Especially for queries with a small LIMIT, this may be more expensive if Postgres cannot optimize the query plan as well any more.

Aside:

Use legal, loser-case identifiers, so you don't have to double-quote.

And use table aliases to de-noise your big queries.