Icloud + Coredata - How to Avoid Pre-Filled Data Duplication

iCloud + CoreData - how to avoid pre-filled data duplication?

Strategy 1 with some modifications appeared to be a working solutions (with some flaws though).

Legend:

  • 1st device - started online without any content in the iCloud
  • 2nd device - started later than first and OFFLINE. Then it gets online after some items added

So here's the updated strategy:

  • All my categories have creation time-stamps

  • The categories cannot be renamed (only added or deleted - this is crucial)

  • All my items have a string categoryName field which gets its value upon item creation and updated whenever item is moved to a different category - this redundant information helps to achieve success;

On insertion of new Categories:

  • On insert from iCloud, I get pairs of categories with same name if any

  • Select newer duplicate categories (they will most probably have less items than old ones so we will have less dance in iCloud)

  • Move their items if any to older duplicate categories

  • Delete newer duplicate categories

On insertion of new Items - if the item belongs to deleted category:

  • CoreData tries to merge it and fails as there's no parent category any more (lots of errors in console). It promisses to insert it later.

  • After some short time it does merge and insert the item into storage but with NIL category

  • Here we pick our item up, find out it's parent category from categoryName and put it to the correct category

VOILA! - no duplicates & everybody happy

A couple of notes:

  1. I get a dance of items belonging to the 2nd device (those that will come with nil category to the 1st device) on both devices. After a couple of minutes everything is stabilized
  2. No items is lost though
  3. The dance happens only on the first iCloud sync of the 2nd (or any other subsequent device)
  4. If the 2nd device is started online for the first time the chance that duplicate categories case appears is about 25% only - tested on 3G connection - so dance should not affect the majority of users

What's the best approach to prefill Core Data store when using NSPersistentCloudKitContainer?

Maybe it's too late to answer but I am working on the same issue recently. After weeks of research and I would like to leave here what I've learned, hope to help someone having the same problem.

An easy way if I can check that an entity already exists remotely?

Any other way to avoid objects being saved twice in CloudKit?

Yes, we can check if the entity already exists on iCloud, but that's not the best way to decide whether to parse the JSON file and save it to CoreData persistentStore or not. Chances are the app is not connected to an Apple ID / iCloud, or having some network issue that makes it not reliable to check if that entity exists remotely or not.

The current solution is to deduplicate the data ourselves, by adding a UUID field to every data object added from the JSON file, and remove the object with the same UUID.
Most of the time I would also add a lastUpdate field, so we can keep the most latest data object.

Getting notified when fetching data from remote has finished?

We can add an observer of NSPersistentStoreRemoteChange, and get notifications whenever the remote store changes.

Apple provided a demo project on using CoreData with CloudKit, and explain the deduplication quite well.

Synchronizing a Local Store to the Cloud
https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud

WWDC2019 session 202: Using CoreData with CloudKit
https://developer.apple.com/videos/play/wwdc2019/202

The whole idea is to listen to changes in remote store, keep track of the changes history, and deduplicate our data when there is any new data coming in. (And of course we need some field to determine whether the data is duplicated or not). The persistent store provides a history tracking feature, and we can fetch those transactions when they are merging to the local store, and run our deduplication process. Let's say we will parse JSON and import Tags when app launched:

// Use a custom queue to ensure only one process of history handling at the same time
private lazy var historyQueue: OperationQueue = {
let queue = OperationQueue()
queue.maxConcurrentOperationCount = 1
return queue
}()

lazy var persistentContainer: NSPersistentContainer = {
let container = NSPersistentCloudKitContainer(name: "CoreDataCloudKitDemo")

...
// set the persistentStoreDescription to track history and generate notificaiton (NSPersistentHistoryTrackingKey, NSPersistentStoreRemoteChangeNotificationPostOptionKey)
// load the persistentStores
// set the mergePolicy of the viewContext
...

// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container.persistentStoreCoordinator)

return container
}()

@objc func storeRemoteChange(_ notification: Notification) {
// Process persistent history to merge changes from other coordinators.
historyQueue.addOperation {
self.processPersistentHistory()
}
}

// To fetch change since last update, deduplicate if any new insert data, and save the updated token
private func processPersistentHistory() {
// run in a background context and not blocking the view context.
// when background context is saved, it will merge to the view context based on the merge policy
let taskContext = persistentContainer.newBackgroundContext()
taskContext.performAndWait {
// Fetch history received from outside the app since the last token
let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
request.fetchRequest = historyFetchRequest

let result = (try? taskContext.execute(request)) as? NSPersistentHistoryResult
guard let transactions = result?.result as? [NSPersistentHistoryTransaction],
!transactions.isEmpty
else { return }

// Tags from remote store
var newTagObjectIDs = [NSManagedObjectID]()
let tagEntityName = Tag.entity().name

// Append those .insert change in the trasactions that we want to deduplicate
for transaction in transactions where transaction.changes != nil {
for change in transaction.changes!
where change.changedObjectID.entity.name == tagEntityName && change.changeType == .insert {
newTagObjectIDs.append(change.changedObjectID)
}
}

if !newTagObjectIDs.isEmpty {
deduplicateAndWait(tagObjectIDs: newTagObjectIDs)
}

// Update the history token using the last transaction.
lastHistoryToken = transactions.last!.token
}
}

Here we save the ObjectID of the added Tags so we can deduplicate them on any other object context,

private func deduplicateAndWait(tagObjectIDs: [NSManagedObjectID]) {
let taskContext = persistentContainer.backgroundContext()

// Use performAndWait because each step relies on the sequence. Since historyQueue runs in the background, waiting won’t block the main queue.
taskContext.performAndWait {
tagObjectIDs.forEach { tagObjectID in
self.deduplicate(tagObjectID: tagObjectID, performingContext: taskContext)
}
// Save the background context to trigger a notification and merge the result into the viewContext.
taskContext.save(with: .deduplicate)
}
}

private func deduplicate(tagObjectID: NSManagedObjectID, performingContext: NSManagedObjectContext) {
// Get tag by the objectID
guard let tag = performingContext.object(with: tagObjectID) as? Tag,
let tagUUID = tag.uuid else {
fatalError("###\(#function): Failed to retrieve a valid tag with ID: \(tagObjectID)")
}

// Fetch all tags with the same uuid
let fetchRequest: NSFetchRequest<Tag> = Tag.fetchRequest()
// Sort by lastUpdate, keep the latest Tag
fetchRequest.sortDescriptors = [NSSortDescriptor(key: "lastUpdate", ascending: false)]

fetchRequest.predicate = NSPredicate(format: "uuid == %@", tagUUID)

// Return if there are no duplicates.
guard var duplicatedTags = try? performingContext.fetch(fetchRequest), duplicatedTags.count > 1 else {
return
}
// Pick the first tag as the winner.
guard let winner = duplicatedTags.first else {
fatalError("###\(#function): Failed to retrieve the first duplicated tag")
}
duplicatedTags.removeFirst()
remove(duplicatedTags: duplicatedTags, winner: winner, performingContext: performingContext)
}

And the most difficult part (in my opinion) is to handle those relationship of the duplicated object that got deleted, lets say our Tag object have a one-to-many relationship with a Category object (each Tag may have multiple Category)

private func remove(duplicatedTags: [Tag], winner: Tag, performingContext: NSManagedObjectContext) {
duplicatedTags.forEach { tag in
// delete the tag AFTER we handle the relationship
// and be careful that the delete rule will also activate
defer { performingContext.delete(tag) }

if let categorys = tag.categorys as? Set<Category> {
for category in categorys {
// re-map those category to the winner Tag, or it will become nil when the duplicated Tag got delete
category.ofTag = winner
}
}
}
}

One interesting thing is, if the Category objects are also added from the remote store, they may not yet exist when we handle the relationship, but that's another story.

iCloud and Core Data pre-filled database

OK, here is the solution I managed to get approved (finally!)

This is the code for setting the Skip Backup attribute - note that it is different for 5.0.1 and below and 5.1 and above.

#include <sys/xattr.h>
- (BOOL)addSkipBackupAttributeToItemAtURL:(NSURL *)URL
{
if (&NSURLIsExcludedFromBackupKey == nil) { // iOS <= 5.0.1
const char* filePath = [[URL path] fileSystemRepresentation];

const char* attrName = "com.apple.MobileBackup";
u_int8_t attrValue = 1;

int result = setxattr(filePath, attrName, &attrValue, sizeof(attrValue), 0, 0);
return result == 0;
} else { // iOS >= 5.1
NSError *error = nil;
[URL setResourceValue:[NSNumber numberWithBool:YES] forKey:NSURLIsExcludedFromBackupKey error:&error];
return error == nil;
}
}

And here is my persistentStoreCoordinator

- (NSPersistentStoreCoordinator *)persistentStoreCoordinator {
if (__persistentStoreCoordinator != nil)
{
return __persistentStoreCoordinator;
}

NSURL *storeURL = [[self applicationDocumentsDirectory] URLByAppendingPathComponent:@"store.sqlite"];

NSError *error;

NSFileManager *fileManager = [NSFileManager defaultManager];
NSString *storePath = [[[self applicationDocumentsDirectory] path] stringByAppendingPathComponent:@"store.sqlite"];

// For iOS 5.0 - store in Caches and just put up with purging
// Users should be on at least 5.0.1 anyway
if ([[[UIDevice currentDevice] systemVersion] isEqualToString:@"5.0"]) {
NSArray *paths = NSSearchPathForDirectoriesInDomains(NSCachesDirectory, NSUserDomainMask, YES);
NSString *cacheDirectory = [paths objectAtIndex:0];
NSString *oldStorePath = [storePath copy];
storePath = [cacheDirectory stringByAppendingPathComponent:@"store.sqlite"];
storeURL = [NSURL URLWithString:storePath];

// Copy existing file
if ([fileManager fileExistsAtPath:oldStorePath]) {
[fileManager copyItemAtPath:oldStorePath toPath:storePath error:NULL];
[fileManager removeItemAtPath:oldStorePath error:NULL];
}
}
// END iOS 5.0

if (![fileManager fileExistsAtPath:storePath]) {
// File doesn't exist - copy it over
NSString *defaultStorePath = [[NSBundle mainBundle] pathForResource:@"store" ofType:@"sqlite"];
if (defaultStorePath) {
[fileManager copyItemAtPath:defaultStorePath toPath:storePath error:NULL];
}
}

NSDictionary *options = [NSDictionary dictionaryWithObjectsAndKeys: [NSNumber numberWithBool:YES], NSMigratePersistentStoresAutomaticallyOption, [NSNumber numberWithBool:YES], NSInferMappingModelAutomaticallyOption, nil];

__persistentStoreCoordinator = [[NSPersistentStoreCoordinator alloc] initWithManagedObjectModel:[self managedObjectModel]];
if (![__persistentStoreCoordinator addPersistentStoreWithType:NSSQLiteStoreType configuration:nil URL:storeURL options:options error:&error])
{
NSLog(@"Unresolved error %@, %@", error, [error userInfo]);
abort();
}

[self addSkipBackupAttributeToItemAtURL:storeURL];

return __persistentStoreCoordinator;
}

Note that I made the decision to just store in Caches and put up with purging for iOS 5.0 users.

This was approved by Apple this month.

Please don't copy and paste this code without reading and understanding it first - it may not be totally accurate or optimised, but I hope it can guide someone to a solution that helps them.

iCloud Core Data Load of initial data for 2nd device

You can't just set a boolean value to check for initial data, because the boolean value also needs to sync and you might not get it in time. You also can't check for the initial data, because it also needs to sync and might not arrive in time.

But there are a couple of possibilities.

  • Put the initial data in a separate persistent store file which is not synced. You can have multiple store files, and you can add them all to your persistent store coordinator. Put the initial data in one store file that is not synced and put all the other data in a separate file that is synced. Since the initial data won't sync, there won't be any duplicates. You can't have relationships from one store file to another one, but you can use fetched properties to get something similar.

  • Keep putting all the data in the same file, but filter out duplicates. With everything in one place, duplicates are inevitable, but you can deal with the problem. You end up waiting until the duplicates appear, finding them, and removing them. It's kind of annoying to need to do this, but it's really the only way if you put everything in the same persistent store. I described the process a while ago in a blog post at my site which goes into detail on how to do this efficiently.

Core Data, iCloud and Pre-Built Data with iOS 7

In your Category entity, in the PrebuiltData model, create an "id" property (and make sure that this is consistent from version to version) This model should not be managed by iCloud.

In your Task entity, in the UserData model, create a "categoryId" property. This model should be managed by iCloud (or this discussion is meaningless)

Now you can create a method on your Category entity to fetch all the Tasks in the category using:

-(NSArray*)tasks
{
NSFetchRequest* request = [NSFetchRequest fetchRequestWithEntityName:@"Task"];
request.predicate = [NSPredicate predicateWithFormat:@"id = %@", self.id];
return [gTaskManagedObjectContext executeFetchRequest:request error:NULL];
}

Likewise, you can create a method on your Task entity to fetch the category:

-(Category*)category
{
NSFetchRequest* request = [NSFetchRequest fetchRequestWithEntityName:@"Category"];
request.predicate = [NSPredicate predicateWithFormat:@"id = %@", self.categoryId];
NSArray* results = [gTaskManagedObjectContext executeFetchRequest:request error:NULL];

return results.count > 0 ? results[0] : nil;
}

Any way to pre populate core data?

Here's the best way (and doesn't require SQL knowledge):

Create a quick Core Data iPhone app (Or even Mac app) using the same object model as your List app. Write a few lines of code to save the default managed objects you want to the store. Then, run that app in the simulator. Now, go to ~/Library/Application Support/iPhone Simulator/User/Applications. Find your application among the GUIDs, then just copy the sqlite store out into your List app's project folder.

Then, load that store like they do in the CoreDataBooks example.



Related Topics



Leave a reply



Submit