Rails - Distinct on After a Join

Rails - Distinct ON after a join

Use a subquery and add a different ORDER BY clause in the outer SELECT:

SELECT *
FROM  (
   SELECT DISTINCT ON (pr.id)
          pu.updated_at, pr.*
   FROM   Product pr
   JOIN   Purchases pu ON pu.product_id = pr.id  -- guessing
   ORDER  BY pr.id, pu.updated_at DESC NULLS LAST
   ) sub
ORDER  BY updated_at DESC NULLS LAST;

Details for DISTINCT ON:

Select first row in each GROUP BY group?

Or some other query technique:

Optimize GROUP BY query to retrieve latest record per user

But if all you need from Purchases is updated_at, you can get this cheaper with a simple aggregate in a subquery before you join:

SELECT *
FROM   Product pr
JOIN  (
   SELECT product_id, max(updated_at) AS updated_at
   FROM   Purchases 
   GROUP  BY 1
   ) pu ON pu.product_id = pr.id  -- guessing
ORDER  BY pu.updated_at DESC NULLS LAST;

About NULLS LAST:

PostgreSQL sort by datetime asc, null first?

Or even simpler, but not as fast while retrieving all rows:

SELECT pr.*, max(updated_at) AS updated_at
FROM   Product pr
JOIN   Purchases pu ON pu.product_id = pr.id
GROUP  BY pr.id  -- must be primary key
ORDER  BY 2 DESC NULLS LAST;

Product.id needs to be defined as primary key for this to work. Details:

PostgreSQL - GROUP BY clause
Return a grouped list with occurrences using Rails and PostgreSQL

If you fetch only a small selection (with a WHERE clause restricting to just one or a few pr.id for instance), this will be faster.

Get distinct rows using inner join

Event
  .joins(:picks)
  .where(picks: {result: nil, created_at: 5.days.ago..Time.now, league_id: 1})
  .group("events.id")

Distinct Records with joins and order

Try this:

User.select('users.*,MAX(donations.created_at) as most_recent_donation').
  joins(:donations).order('most_recent_donation desc').group('users.id')

I suppose an user has many donations, this would select the most recent created donation and would select distinct users filtering by their id.

I have not tested this though.

Rails: remove duplicates after ordering a join table

Book.joins(:reviews).order('reviews.created_at DESC').distinct

you're trying to select distinct bookings from the join table of books and reviews, and then order this list of distinct bookings according to the reviews.created_at time. The SQL would be like this:

SELECT DISTINCT "books"."id" FROM "books" INNE JOIN "reviews" ON "reviews"."book_id" = "books"."id" ORDER BY reviews.created_at

There is a good reason why this is not allowed. Because the results are indeterminate. Imagine you have 100 reviews for one book. In the join table, you'll have 100 rows of this book with all different reviews. When you select a distinct list, you end up with one row of this book. This could be any one of the 100 in the join table. Then you order this based on the created_at of this review. As the review could be any one of the 100, the order could be different every time.

This would be perfectly fine:

Book.joins(:reviews).order('books.id DESC').distinct

Because it doesn't matter which of the 100 rows it picks for that book, the books.id is the same.

Back to your problem. Seems you're trying to get the 5 books with the most recent reviews. I don't see a simple way to do it but here's my solution:

res = Review.group("book_id").maximum("created_at") # {book_id => create_at}, each book with its most recent review time
arr = res.to_a.sort { |a,b| b[1]<=>a[1] } #array sorted by created_at in desc order
arr.map{ |n| n[0] }.take(5)  #top 5 books' ids with most recent reviews

Distinct Join Rails

Try using :include instead of :joins and you should not see any more duplicate video results.

Also, you should be using a lambda in your named_scope, otherwise Time.now will be cached the first time you use it and you'll start getting incorrect results.

named_scope :scheduled_in_future, lambda {
  { 
    :include => :events,
    :conditions => ["event.scheduled_start > ? AND event.status = ?", Time.now.to_i, 'PENDING']
  }
}

ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list using order by with uniq

You need to select the columns first so you can order them later in .order
The result will still have duplicated records in spite of using .uniq or .distinct because the generated query is SELECT DISTINCT products.*, prices.finish_date, ... tries to find all the combination of products.*, prices.finish_date and the special column that has a unique value (in this case you only want the products.id to be unique)

DISTINCT ON is the solution but using it is a little bit tricky for postgres because of SELECT DISTINCT ON expressions must match initial ORDER BY expressions.

Please try:

sub_query = Product.joins(:prices)
  .select("DISTINCT ON (products.id) products.*, CASE WHEN (products.quantity >= products.min_quantity) AND (prices.finish_date IS NULL OR prices.finish_date >= now()) THEN 0 ELSE 1 END AS t, prices.finish_date AS date")

query = Product.from("(#{sub_query.to_sql}) as tmp").select("tmp.*").order("tmp.t, tmp.date ASC")

Selecting unique rows when doing a join

includes performs a LEFT OUTER JOIN, which indeed creates duplicates. If you don't need to access to each @car.seller after your query (n+1 query issue), just use joins instead :

Car.joins(:sellers).
    where(:cars => {:brand => 'Fiat', 
                    :model_name => 'Panda'}, 
          :sellers => {:kind => 'Dealer'})

joins performs an INNER JOIN, so you shouldn't get duplicates.

Ruby on Rails 4 count distinct with inner join

After doing long chat, we found the below query to work :

self.member
    .engines(:reload)
    .count("DISTINCT engine_code")

Rails - Distinct on After a Join