Using ActiveRecord find_in_batches method for deleting large data
I don't think anyone has answered your question.
To answer 'what you are doing wrong' and can you use 'find_in_batches' in that way:
The reason why 'delete_all' does not work, is because 'delete_all' only works on activerecord relations. When you use 'find_in_batches' the variable 'batch' is now just a normal array, which may have it's own 'delete_all' method that is different,
You may need 'find_in_batches' incase if you have thousands of records to be deleted. So the previous answer is incorrect. (It may lead to memory exceeded exceptions and timeouts)
Note that is not related to the original error you displayed, but you cannot use 'batch' with 'delete_all' because 'batch' is an array and 'delete_all' is for activerecords
How to delete using find_in_batches
I was having a similar problem
user.posts.destroy_all
was overloading the server because of thousands of posts (this is an example my actual model was not 'posts')
You can use
user.posts.select(:id).find_in_batches(batch_size: 100) do |ids|
Post.where(id: ids).delete_all
end
If it was one sql call, it will try to store all the delete items in memory at once that can break the server,
This will have a manageable size of sql calls.
Bulk deleting entries from a model efficiently
destroy_all
is the same as:
books.each(&:destroy)
As you can see in source code
So you can just:
Books.select(:id).where(isbn: nil).find_in_batches(batch_size: 1000) do |books|
# since destroy_all is only a ActiveRecord::Relation method
books.each(&:destroy)
end
This is the minimal query setup to do this the most efficiently way and yet getting your callbacks fired.
Note: if your callbacks need any other attribute than id
loaded, you should add it to your select
query.
ActiveRecord: Alternative to find_in_batches?
Hm I've been thinking about a solution for this (I'm the person who asked the question). It makes sense that find_in_batches
doesn't allow you to have a custom order because lets say you sort by created_at DESC
and specify a batch_size of 500. The first loop goes from 1-500, the second loop goes from 501-1000, etc. What if before the 2nd loop occurs, someone inserts a new record into the table? That would be put onto the top of the query results and your results would be shifted 1 to the left and your 2nd loop would have a repeat.
You could argue though that created_at ASC
would be safe then, but it's not guaranteed if your app specifies a created_at value.
UPDATE:
I wrote a gem for this problem: https://github.com/EdmundMai/batched_query
Since using it, the average memory of my application has HALVED. I highly suggest anyone having similar issues to check it out! And contribute if you want!
Deleting millions of rows in MySQL
DELETE FROM `table`
WHERE (whatever criteria)
ORDER BY `id`
LIMIT 1000
Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.
batch destroy all in rails active record
If "variant" records don’t have any dependencies that would have to be deleted from the database use dependent: :delete_all
instead of dependent: :destroy
products_count = Product.count
# Determine how many batches need to be run
number_of_iterations = (products_count.to_f / 1000).ceil
(1..number_of_iterations).each do |i|
Product.limit(1000).delete_all
end
When dealing with MASSIVE amounts of data it’s good to batch the deletion. If you delete more than 5,000 rows in a single transaction, your database will lock. This means the entire table is inaccessible by any other running process for the duration of the transaction. This can mean some serious issues for the users of your site while a DELETE is happening.
Related Topics
Ruby -V Dyld: Library Not Loaded: /Usr/Local/Lib/Libgmp.10.Dylib
How to Get a List of Gems That Are Installed That Have Native Extensions
Rails -V Cannot Load Such File -- Rails/Cli (Loaderror)
Could Not Find Rake with Bundle Exec
Clicking Link with JavaScript in Mechanize
Xlsx Compressed by Rubyzip Not Readable by Excel
In Ruby How to Automatically Populate Instance Variables Somehow in the Initialize Method
Rails Generate Error: No Such File or Directory - Getcwd
Heroku: How to Push to Specific App If You Have Multiple Apps in Heroku
How to Scrape Pages Which Have Lazy Loading
Remove Double Quotes from String
Stack Level Too Deep When Using Carrierwave Versions
Database Cleaner Not Working in Minitest Rails
How to Share State Between Scenarios Using Cucumber
Parsing Date from Text Using Ruby
Is There an Equivalent of Array#Find_Index for the Last Index in Ruby
Your Ruby Version Is 2.2.4, But Your Gemfile Specified 2.3.0