Return a Grouped List with Occurrences Using Rails and Postgresql

Return a grouped list with occurrences using Rails and PostgreSQL

Your problem:

Unfortunately the strictness of Postgres breaks that query because it requires all fields to be specified in the group by clause.

Now, that has changed somewhat with PostgreSQL 9.1 (quoting release notes of 9.1):

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)

What's more, the basic query you describe would not even run into this:

Show a list of the 5 most commonly used tags, together
with the times they have been tagged.

SELECT tag_id, count(*) AS times
FROM taggings
GROUP BY tag_id
ORDER BY times DESC
LIMIT 5;

Works in any case.

Summary Count by Group with ActiveRecord

I gave up on trying to do this the ActiveRecord way. Instead I just constructed my query into a string and passed the string into

ActiveRecord::Base.connection.execute(sql_string)

This had the side effect that my result set came out as a array instead of a set of objects. So getting at the values went from a syntax (where user_data is the name assigned to a single record from the result set) like

user_data.total_count

to

user_data['total_count']

But that's a minor issue. Not worth the hassle.

Limit in Group - ActiveRecord Postgres

For anyone experiencing a similar issue, I would recommend checking out window functions and this blog post covering different ways to solve a similar question. The three approaches covered in the post include using 1) group_by, 2) SQL subselects, 3) window functions.

My solution, using window functions:

@events.where("(events.id)
IN (
SELECT id FROM
( SELECT DISTINCT id,
row_number() OVER (PARTITION BY DATE_TRUNC('day', start) ORDER BY id) AS rank
FROM events) AS result
WHERE (
start >= '#{startt}' and
start <= '#{endt}' and
rank <= 3
)
)
")

Find single occurrence of matched and non-matched records from has many through association

There is something odd in your model. The relationship between groups.name and the user_id that shows up in both groups and favourites is unclear. The unique constraint on favourites_groups should make the user_id in favourites unnecessary, so I added a commented-out join condition.

Please try this query to see if it returns what you need:

select g.id as group_id, g.name as group_name
from groups g
left join favourites_groups fg
on fg.group_id = g.id
left join favourites f
on f.id = fg.favorite_id
-- and f.user_id = g.user_id
where g.user_id = 100
and f.product_id = 1002
;

Update
Sorry about that. This should return what you want:

select g.id as group_id, g.name as group_name,
max(f.id) as favorite_id,
max(f.product_id) as product_id
from groups g
left join favourites_groups fg
on fg.group_id = g.id
left join favourites f
on f.id = fg.favorite_id
and f.product_id = 1000
where g.user_id = 100
group by g.id, g.name
order by g.id;

rails group order by count

You need explicitly specify column(s), on which you do GROUP BY in SELECT clause.

All other parts of SELECT clause must be aggregates like count(), sum(), etc.

Notice, that we use count(distinct ..) here because each animal ID might appear multiple times due to the chain of JOINs:

SELECT
interests.id,
COUNT(DISTINCT animals.id) as animals_count
JOIN interests_animals ON animals.id = interests_animals.animal_id
JOIN interests ON interests_animals.interest_id = interests.id
JOIN interests_users ON interests.id = interests_users.interest_id
WHERE interests_users.user_id = XXX
GROUP BY 1
ORDER BY 2 desc;

-- in GROUP BY and ORDER BY, it is usually convenient to use just numbers -- "1" means "the 1st column of SELECT clause", etc.

Also, "INNER" is an optional keyword (simply "JOIN" and "INNER JOIN" are the same thing).

Also, as a side note, you might found useful to add this to your SELECT clause:

, array_agg(animals.id order by animals.id) as animal_ids

-- this will give you integer array of all animal IDs that relate to a particular interest, ordered.

PostgreSQL - GROUP BY clause

Postgres 9.1 or later, quoting the release notes of 9.1 ...

Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)

The SQL standard allows this behavior, and because of the primary key,
the result is unambiguous.

Related:

  • Return a grouped list with occurrences using Rails and PostgreSQL

The queries in the question and in @Michael's answer have the logic backwards. We want to count how many tags match per article, not how many articles have a certain tag. So we need to GROUP BY w_article.id, not by a_tags.id.

list all articles with that tag, and also how many of given tags they match

To fix this:

SELECT count(t.tag) AS ct, a.*  -- any column from table a allowed ...
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
JOIN w_article a ON a.id = a2t.article
WHERE t.tag IN ('css', 'php')
GROUP BY a.id -- ... since PK is in GROUP BY
LIMIT 9;

Assuming id is the primary key of w_article.

However, this form will be faster while doing the same:

SELECT a.*, ct
FROM (
SELECT a2t.article AS id, count(*) AS ct
FROM a_tags t
JOIN w_articles2tag a2t ON a2t.tag = t.id
GROUP BY 1
LIMIT 9 -- LIMIT early - cheaper
) sub
JOIN w_article a USING (id); -- attached alias to article in the sub

Closely related answer from just yesterday:

  • Why does the following join increase the query time significantly?

SQL query to return a grouped result as a single row

The following should work in any RDBMS:

SELECT created_at, count(status) AS total,
sum(case when status = 'error' then 1 end) as errors,
sum(case when status = 'complete' then 1 end) as completed,
sum(case when status = 'on hold' then 1 end) as on_hold
FROM jobs
GROUP BY created_at;

The query uses conditional aggregation so as to pivot grouped data. It assumes that status values are known before-hand. If you have additional cases of status values, just add the corresponding sum(case ... expression.

Demo here

How to GROUP BY several days in PostgreSQL?

SELECT ts, COUNT(DISTINCT(user_id)) FROM 
( SELECT current_date + s.ts FROM generate_series(-20,0,1) AS s(ts) )
AS series(ts)
LEFT JOIN messages
ON messages.created_at::date between ts - 1 and ts -- JOIN on a range
GROUP BY ts
ORDER BY ts


Related Topics



Leave a reply



Submit