ActiveRecord find_each combined with limit and order
The documentation says that find_each and find_in_batches don't retain sort order and limit because:
- Sorting ASC on the PK is used to make the batch ordering work.
- Limit is used to control the batch sizes.
You could write your own version of this function like @rorra did. But you can get into trouble when mutating the objects. If for example you sort by created_at and save the object it might come up again in one of the next batches. Similarly you might skip objects because the order of results has changed when executing the query to get the next batch. Only use that solution with read only objects.
Now my primary concern was that I didn't want to load 30000+ objects into memory at once. My concern was not the execution time of the query itself. Therefore I used a solution that executes the original query but only caches the ID's. It then divides the array of ID's into chunks and queries/creates the objects per chunk. This way you can safely mutate the objects because the sort order is kept in memory.
Here is a minimal example similar to what I did:
batch_size = 512
ids = Thing.order('created_at DESC').pluck(:id) # Replace .order(:created_at) with your own scope
ids.each_slice(batch_size) do |chunk|
Thing.find(chunk, :order => "field(id, #{chunk.join(',')})").each do |thing|
# Do things with thing
end
end
The trade-offs to this solution are:
- The complete query is executed to get the ID's
- An array of all the ID's is kept in memory
- Uses the MySQL specific FIELD() function
Hope this helps!
Using limit and offset in rails together with updated_at and find_each - will that cause a problem?
As many have noted in the comments, it seems like using find_each will ignore the order and limit. I found this answer (ActiveRecord find_each combined with limit and order) that seems to be working for me. It's not working 100% but it is a definite improvement. The rest seems to be a memory issue, i.e. I cannot have too many processes running at the same time on Heroku.
Is there any way to order by specific column when using find_each?
I suppose you could add a pagination gem (you may already have one in your Gemfile, will_paginate
or kaminari
)
That would let you do...
total_batches = (Book.all.count / 50.0).ceil
(1..total_batches).each do |batch|
Book.order(:name).paginate(page: batch, per_page: 50).each do |book|
# do stuff
end
end
Ordered batches clear solution
Straight from the docs:
NOTE: It’s not possible to set the order. That is automatically set to
ascending on the primary key (“id ASC”) to make the batch ordering
work. This also means that this method only works when the primary key
is orderable (e.g. an integer or string).
The reason it is deliberately limited to primary_key order because those values don't change. So if you mutate the data as you're traversing it you dont get repeated options back.
In case of id: :desc
you will not get new records that were inserted after the transaction to get initial batch was started.
Refs
https://rails.lighthouseapp.com/projects/8994/tickets/2502-patch-arbase-reverse-find_in_batches
https://ww.telent.net/2012/5/4/changing_sort_order_with_activerecord_find_in_batches
ActiveRecord find_each combined with limit and order
ActiveRecord limit method does not seem to respect order of relation
As we found out in comments, the problem is that when order
meets two objects with identical engagement
values it "sorts" it in some specific way.
What could help is passing an additional parameter to the ORDER
clause (for example id
):
Company.last.contacts.order(engagement: :desc, id: :asc)
Using scopes and order with limit and offset
mu is too short's comment explained the behaviour. The cmp_id had duplicate values, and evidently the database is not required to sort equal values the same way each time. One way to fix it is to add a secondary key to break ties in a consistent fashion.
Related Topics
Difference Between Timestamps in Milliseconds in Oracle
Extbase - Get Created SQL from Query
How to Pass in Parameters to a SQL Server Script Called with SQLcmd
How to Transform Comma Separated Column into Multiples Rows in Db2
Combine Multiple Select Statements
How to Create a Decimal Field in Access with Alter Table
How to Handle Optional Parameters in SQL Query
How to Use If/Else Statement to Update or Create New Xml Node Entry in SQL
Syntax Error at End of Input in Postgresql
How to Call an Oracle Stored Procedure from an Excel Vba Script
SQL Insert Without Specifying Columns. What Happens
Ssis Best Practice to Load N Tables from Source to Target Server