Using Activerecord Find_In_Batches Method for Deleting Large Data

Using ActiveRecord find_in_batches method for deleting large data

I don't think anyone has answered your question.

To answer 'what you are doing wrong' and can you use 'find_in_batches' in that way:

The reason why 'delete_all' does not work, is because 'delete_all' only works on activerecord relations. When you use 'find_in_batches' the variable 'batch' is now just a normal array, which may have it's own 'delete_all' method that is different,

You may need 'find_in_batches' incase if you have thousands of records to be deleted. So the previous answer is incorrect. (It may lead to memory exceeded exceptions and timeouts)

Note that is not related to the original error you displayed, but you cannot use 'batch' with 'delete_all' because 'batch' is an array and 'delete_all' is for activerecords

How to delete using find_in_batches

I was having a similar problem

user.posts.destroy_all

was overloading the server because of thousands of posts (this is an example my actual model was not 'posts')

You can use

user.posts.select(:id).find_in_batches(batch_size: 100) do |ids|
Post.where(id: ids).delete_all
end

If it was one sql call, it will try to store all the delete items in memory at once that can break the server,
This will have a manageable size of sql calls.

Bulk deleting entries from a model efficiently

destroy_all is the same as:

books.each(&:destroy)

As you can see in source code

So you can just:

Books.select(:id).where(isbn: nil).find_in_batches(batch_size: 1000) do |books|
# since destroy_all is only a ActiveRecord::Relation method
books.each(&:destroy)
end

This is the minimal query setup to do this the most efficiently way and yet getting your callbacks fired.

Note: if your callbacks need any other attribute than id loaded, you should add it to your select query.

ActiveRecord: Alternative to find_in_batches?

Hm I've been thinking about a solution for this (I'm the person who asked the question). It makes sense that find_in_batches doesn't allow you to have a custom order because lets say you sort by created_at DESC and specify a batch_size of 500. The first loop goes from 1-500, the second loop goes from 501-1000, etc. What if before the 2nd loop occurs, someone inserts a new record into the table? That would be put onto the top of the query results and your results would be shifted 1 to the left and your 2nd loop would have a repeat.

You could argue though that created_at ASC would be safe then, but it's not guaranteed if your app specifies a created_at value.

UPDATE:

I wrote a gem for this problem: https://github.com/EdmundMai/batched_query

Since using it, the average memory of my application has HALVED. I highly suggest anyone having similar issues to check it out! And contribute if you want!

Deleting millions of rows in MySQL

DELETE FROM `table`
WHERE (whatever criteria)
ORDER BY `id`
LIMIT 1000

Wash, rinse, repeat until zero rows affected. Maybe in a script that sleeps for a second or three between iterations.

batch destroy all in rails active record

If "variant" records don’t have any dependencies that would have to be deleted from the database use dependent: :delete_all instead of dependent: :destroy

products_count = Product.count

# Determine how many batches need to be run
number_of_iterations = (products_count.to_f / 1000).ceil

(1..number_of_iterations).each do |i|
Product.limit(1000).delete_all
end

When dealing with MASSIVE amounts of data it’s good to batch the deletion. If you delete more than 5,000 rows in a single transaction, your database will lock. This means the entire table is inaccessible by any other running process for the duration of the transaction. This can mean some serious issues for the users of your site while a DELETE is happening.



Related Topics



Leave a reply



Submit