Memory Leak with Large Core Data Batch Insert in Swift

Memory leak with large Core Data batch insert in Swift

There are a few things you should change:

Create a separate NSPrivateQueueConcurrencyType managed object context and do your inserts asynchronously in it.
Don't save after inserting every single entity object. Insert your objects in batches and then save each batch. A batch size might be something like 1000 objects.
Use autoreleasepool and reset to empty the objects in memory after each batch insert and save.

Here is how this might work:

let managedObjectContext = NSManagedObjectContext(concurrencyType: NSManagedObjectContextConcurrencyType.PrivateQueueConcurrencyType)
managedObjectContext.persistentStoreCoordinator = (UIApplication.sharedApplication().delegate as! AppDelegate).persistentStoreCoordinator // or wherever your coordinator is

managedObjectContext.performBlock { // runs asynchronously

    while(true) { // loop through each batch of inserts

        autoreleasepool {
            let array: Array<MyManagedObject>? = getNextBatchOfObjects()
            if array == nil { break }
            for item in array! {
                let newObject = NSEntityDescription.insertNewObjectForEntityForName("MyEntity", inManagedObjectContext: managedObjectContext) as! MyManagedObject
                newObject.attribute1 = item.whatever
                newObject.attribute2 = item.whoever
                newObject.attribute3 = item.whenever
            }
        }

        // only save once per batch insert
        do {
            try managedObjectContext.save()
        } catch {
            print(error)
        }

        managedObjectContext.reset()
    }
}

Applying these principles kept my memory usage low and also made the mass insert faster.

Sample Image

iOS CoreData batch insert?

Check out the Efficiently Importing Data chapter from the Core Data Programming Guide.

I'm currently having the same problems as you, only I'm inserting 10000 objects and it takes around 30s, which is still slow for me. I'm doing a [managedObjectContext save] on every 1000 managed objects inserted into the context (in other words, my batch size is 1000). I've experimented with 30 different batch sizes (from 1 to 10000), and 1000 seems to be the optimum value in my case.

Core data executeFetchRequest consumes huge amounts of memory

without saving the context

This is the part of the question where experienced Core Data developers say "oh holy crap". That's your biggest problem right there. Save changes at regular intervals-- every 50 entries, or every 100, but whatever you do, don't wait until you're finished. You're forcing Core Data to keep all of those objects in memory as unsaved changes. This is the biggest reason you're having memory problems.

Some other things you should consider:

Don't fetch your objects one at a time. Fetches are relatively expensive. If you run through 100k instances and fetch each of them one at a time, your code will be spending almost all of its time executing fetches. Fetch in batches of 50-100 (you can tune the number to get the best balance of speed vs. memory use). Process one batch, then save changes at the end of the batch.
When you're done with a fetched object, tell the managed object context that you're done. Do this by calling refreshObject:mergeChanges: with NO as the second argument. That tells the context it can free up any internal memory it's using for the object. This loses any unsaved changes on the objects, but if you haven't made any changes then there's nothing to lose.
Consider getting rid of the PlayerClub entity completely. Core Data supports many-to-many relationships. This kind of entity is almost never useful. You're using Core Data, not SQL, so don't design your entity types as if you were.

Core Data: How to insert huge arrays to core data in swift?

The code is currently saving the context for each and every new BarCodeDimension. You should batch up the updates. To do so remove this code from the insertBarcodeDimension function:

var error: NSError ?
  if (moc.save( & error)) {
    println(error ? .localizedDescription)
  }

and put it instead in the for loop of getProducts, after the insertBarcodeDimension... call, in an if statement to ensure it happens only every X times through the loop.

if i % 100 == 0 {
    var error: NSError ?
    if (moc.save( & error)) {
        println(error ? .localizedDescription)
    }
}

Here I've used a value for X of 100, but adjust this figure through trial and error to trade off speed against memory footprint. That will probably speed things significantly.

The other suggestion in the document referred to in comments is to set the undoManager to nil. If you are confident that your moc does not need an undo manager, then amend the code which initialises your moc to include:

moc.undoManager = nil

Or if it's more expedient, put it in at the start of the getProducts function.

Where should NSManagedObjectContext be created?

Now I don't understand. Are they saying I am using the same managed
object context or I should use the same managed object context? If I
am using the same one, how is it that I create a new one on each while
loop? Or if I should be using just one global context, how do I do it
without causing memory leaks?

Let's look at the first part of your code...

while (thereAreStillMoreObjectsToAdd) {
    let managedObjectContext = (UIApplication.sharedApplication().delegate as! AppDelegate).managedObjectContext
    managedObjectContext.undoManager = nil

Now, since it appears you are keeping your MOC in the App Delegate, it's likely that you are using the template-generated Core Data access code. Even if you are not, it is highly unlikely that your managedObjectContext access method is returning a new MOC each time it is called.

Your managedObjectContext variable is merely a reference to the MOC that is living in the App Delegate. Thus, each time through the loop, you are merely making a copy of the reference. The object being referenced is the exact same object each time through the loop.

Thus, I think they are saying that you are not using separate contexts, and I think they are right. Instead, you are using a new reference to the same context each time through the loop.

Now, your next set of questions have to do with performance. Your other post references some good content. Go back and look at it again.

What they are saying is that if you want to do a big import, you should create a separate context, specifically for the import (Objective C since I have not yet made time to learn Swift).

NSManagedObjectContext moc = [[NSManagedObjectContext alloc]
    initWithConcurrencyType:NSPrivateQueueConcurrencyType];

You would then attach that MOC to the Persistent Store Coordinator. Using performBlock you would then, in a separate thread, import your objects.

The batching concept is correct. You should keep that. However, you should wrap each batch in an auto release pool. I know you can do it in swift... I'm just not sure if this is the exact syntax, but I think it's close...

autoreleasepool {
    for item in array {
        let newObject = NSEntityDescription.insertNewObjectForEntityForName ...
        newObject.attribute1 = item.whatever
        newObject.attribute2 = item.whoever
        newObject.attribute3 = item.whenever
    }
}

In pseudo-code, it would all look something like this...

moc = createNewMOCWithPrivateQueueConcurrencyAndAttachDirectlyToPSC()
moc.performBlock {
    while(true) {
        autoreleasepool {
            objects = getNextBatchOfObjects()
            if (!objects) { break }
            foreach (obj : objects) {
                insertObjectIntoMoc(obj, moc)
            }
        }
        moc.save()
        moc.reset()
    }
}

If someone wants to turn that pseudo-code into swift, it's fine by me.

The autorelease pool ensures that any objects autoreleased as a result of creating your new objects are released at the end of each batch. Once the objects are released, the MOC should have the only reference to objects in the MOC, and once the save happens, the MOC should be empty.

The trick is to make sure that all object created as part of the batch (including those representing the imported data and the managed objects themselves) are all created inside the autorelease pool.

If you do other stuff, like fetching to check for duplicates, or have complex relationships, then it is possible that the MOC may not be entirely empty.

Thus, you may want to add the swift equivalent of [moc reset] after the save to ensure that the MOC is indeed empty.

Core Data Import - Not releasing memory

Ok this is quite embarrassing... Zombies were enabled on the Scheme, on the Arguments they were turned off but on Diagnostics "Enable Zombie Objects" was checked...

Turning this off maintains the memory stable.

Thanks for the ones that read trough the question and tried to solve it!

Memory Leak with Large Core Data Batch Insert in Swift