What's the Best Approach to Prefill Core Data Store When Using Nspersistentcloudkitcontainer

What's the best approach to prefill Core Data store when using NSPersistentCloudKitContainer?

Maybe it's too late to answer but I am working on the same issue recently. After weeks of research and I would like to leave here what I've learned, hope to help someone having the same problem.

An easy way if I can check that an entity already exists remotely?

Any other way to avoid objects being saved twice in CloudKit?

Yes, we can check if the entity already exists on iCloud, but that's not the best way to decide whether to parse the JSON file and save it to CoreData persistentStore or not. Chances are the app is not connected to an Apple ID / iCloud, or having some network issue that makes it not reliable to check if that entity exists remotely or not.

The current solution is to deduplicate the data ourselves, by adding a UUID field to every data object added from the JSON file, and remove the object with the same UUID.
Most of the time I would also add a lastUpdate field, so we can keep the most latest data object.

Getting notified when fetching data from remote has finished?

We can add an observer of NSPersistentStoreRemoteChange, and get notifications whenever the remote store changes.

Apple provided a demo project on using CoreData with CloudKit, and explain the deduplication quite well.

Synchronizing a Local Store to the Cloud
https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud

WWDC2019 session 202: Using CoreData with CloudKit
https://developer.apple.com/videos/play/wwdc2019/202

The whole idea is to listen to changes in remote store, keep track of the changes history, and deduplicate our data when there is any new data coming in. (And of course we need some field to determine whether the data is duplicated or not). The persistent store provides a history tracking feature, and we can fetch those transactions when they are merging to the local store, and run our deduplication process. Let's say we will parse JSON and import Tags when app launched:

// Use a custom queue to ensure only one process of history handling at the same time
private lazy var historyQueue: OperationQueue = {
let queue = OperationQueue()
queue.maxConcurrentOperationCount = 1
return queue
}()

lazy var persistentContainer: NSPersistentContainer = {
let container = NSPersistentCloudKitContainer(name: "CoreDataCloudKitDemo")

...
// set the persistentStoreDescription to track history and generate notificaiton (NSPersistentHistoryTrackingKey, NSPersistentStoreRemoteChangeNotificationPostOptionKey)
// load the persistentStores
// set the mergePolicy of the viewContext
...

// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self, selector: #selector(type(of: self).storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange, object: container.persistentStoreCoordinator)

return container
}()

@objc func storeRemoteChange(_ notification: Notification) {
// Process persistent history to merge changes from other coordinators.
historyQueue.addOperation {
self.processPersistentHistory()
}
}

// To fetch change since last update, deduplicate if any new insert data, and save the updated token
private func processPersistentHistory() {
// run in a background context and not blocking the view context.
// when background context is saved, it will merge to the view context based on the merge policy
let taskContext = persistentContainer.newBackgroundContext()
taskContext.performAndWait {
// Fetch history received from outside the app since the last token
let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
request.fetchRequest = historyFetchRequest

let result = (try? taskContext.execute(request)) as? NSPersistentHistoryResult
guard let transactions = result?.result as? [NSPersistentHistoryTransaction],
!transactions.isEmpty
else { return }

// Tags from remote store
var newTagObjectIDs = [NSManagedObjectID]()
let tagEntityName = Tag.entity().name

// Append those .insert change in the trasactions that we want to deduplicate
for transaction in transactions where transaction.changes != nil {
for change in transaction.changes!
where change.changedObjectID.entity.name == tagEntityName && change.changeType == .insert {
newTagObjectIDs.append(change.changedObjectID)
}
}

if !newTagObjectIDs.isEmpty {
deduplicateAndWait(tagObjectIDs: newTagObjectIDs)
}

// Update the history token using the last transaction.
lastHistoryToken = transactions.last!.token
}
}

Here we save the ObjectID of the added Tags so we can deduplicate them on any other object context,

private func deduplicateAndWait(tagObjectIDs: [NSManagedObjectID]) {
let taskContext = persistentContainer.backgroundContext()

// Use performAndWait because each step relies on the sequence. Since historyQueue runs in the background, waiting won’t block the main queue.
taskContext.performAndWait {
tagObjectIDs.forEach { tagObjectID in
self.deduplicate(tagObjectID: tagObjectID, performingContext: taskContext)
}
// Save the background context to trigger a notification and merge the result into the viewContext.
taskContext.save(with: .deduplicate)
}
}

private func deduplicate(tagObjectID: NSManagedObjectID, performingContext: NSManagedObjectContext) {
// Get tag by the objectID
guard let tag = performingContext.object(with: tagObjectID) as? Tag,
let tagUUID = tag.uuid else {
fatalError("###\(#function): Failed to retrieve a valid tag with ID: \(tagObjectID)")
}

// Fetch all tags with the same uuid
let fetchRequest: NSFetchRequest<Tag> = Tag.fetchRequest()
// Sort by lastUpdate, keep the latest Tag
fetchRequest.sortDescriptors = [NSSortDescriptor(key: "lastUpdate", ascending: false)]

fetchRequest.predicate = NSPredicate(format: "uuid == %@", tagUUID)

// Return if there are no duplicates.
guard var duplicatedTags = try? performingContext.fetch(fetchRequest), duplicatedTags.count > 1 else {
return
}
// Pick the first tag as the winner.
guard let winner = duplicatedTags.first else {
fatalError("###\(#function): Failed to retrieve the first duplicated tag")
}
duplicatedTags.removeFirst()
remove(duplicatedTags: duplicatedTags, winner: winner, performingContext: performingContext)
}

And the most difficult part (in my opinion) is to handle those relationship of the duplicated object that got deleted, lets say our Tag object have a one-to-many relationship with a Category object (each Tag may have multiple Category)

private func remove(duplicatedTags: [Tag], winner: Tag, performingContext: NSManagedObjectContext) {
duplicatedTags.forEach { tag in
// delete the tag AFTER we handle the relationship
// and be careful that the delete rule will also activate
defer { performingContext.delete(tag) }

if let categorys = tag.categorys as? Set<Category> {
for category in categorys {
// re-map those category to the winner Tag, or it will become nil when the duplicated Tag got delete
category.ofTag = winner
}
}
}
}

One interesting thing is, if the Category objects are also added from the remote store, they may not yet exist when we handle the relationship, but that's another story.

What are some reliable mechanism to prevent data duplication in CoreData CloudKit?

There is no unique constraint feature once we have integrated with CloudKit.

The workaround on this limitation is

Once duplication is detected after insertion by CloudKit, we will
perform duplicated data deletion.

The challenging part of this workaround is, how can we be notified when there is insertion performed by CloudKit?

Here's step-by-step on how to be notified when there is insertion performed by CloudKit.

  1. Turn on NSPersistentHistoryTrackingKey feature in CoreData.
  2. Turn on NSPersistentStoreRemoteChangeNotificationPostOptionKey feature in CoreData.
  3. Set viewContext.transactionAuthor = "app". This is an important step so that when we query on transaction history, we know which DB transaction is initiated by our app, and which DB transaction is initiated by CloudKit.
  4. Whenever we are notified automatically via NSPersistentStoreRemoteChangeNotificationPostOptionKey feature, we will start to query on transaction history. The query will filter based on transaction author and last query token. Please refer to the code example for more detailed.
  5. Once we have detected the transaction is insert, and it operates on our concerned entity, we will start to perform duplicated data deletion, based on concerned entity


Code example

import CoreData

class CoreDataStack: CoreDataStackable {
let appTransactionAuthorName = "app"

/**
The file URL for persisting the persistent history token.
*/
private lazy var tokenFile: URL = {
return UserDataDirectory.token.url.appendingPathComponent("token.data", isDirectory: false)
}()

/**
Track the last history token processed for a store, and write its value to file.

The historyQueue reads the token when executing operations, and updates it after processing is complete.
*/
private var lastHistoryToken: NSPersistentHistoryToken? = nil {
didSet {
guard let token = lastHistoryToken,
let data = try? NSKeyedArchiver.archivedData( withRootObject: token, requiringSecureCoding: true) else { return }

if !UserDataDirectory.token.url.createCompleteDirectoryHierarchyIfDoesNotExist() {
return
}

do {
try data.write(to: tokenFile)
} catch {
error_log(error)
}
}
}

/**
An operation queue for handling history processing tasks: watching changes, deduplicating tags, and triggering UI updates if needed.
*/
private lazy var historyQueue: OperationQueue = {
let queue = OperationQueue()
queue.maxConcurrentOperationCount = 1
return queue
}()

var viewContext: NSManagedObjectContext {
persistentContainer.viewContext
}

static let INSTANCE = CoreDataStack()

private init() {
// Load the last token from the token file.
if let tokenData = try? Data(contentsOf: tokenFile) {
do {
lastHistoryToken = try NSKeyedUnarchiver.unarchivedObject(ofClass: NSPersistentHistoryToken.self, from: tokenData)
} catch {
error_log(error)
}
}
}

deinit {
deinitStoreRemoteChangeNotification()
}

private(set) lazy var persistentContainer: NSPersistentContainer = {
precondition(Thread.isMainThread)

let container = NSPersistentCloudKitContainer(name: "xxx", managedObjectModel: NSManagedObjectModel.xxx)

// turn on persistent history tracking
let description = container.persistentStoreDescriptions.first
description?.setOption(true as NSNumber, forKey: NSPersistentHistoryTrackingKey)
description?.setOption(true as NSNumber, forKey: NSPersistentStoreRemoteChangeNotificationPostOptionKey)

container.loadPersistentStores(completionHandler: { (storeDescription, error) in
if let error = error as NSError? {
// This is a serious fatal error. We will just simply terminate the app, rather than using error_log.
fatalError("Unresolved error \(error), \(error.userInfo)")
}
})

// Provide transaction author name, so that we can know whether this DB transaction is performed by our app
// locally, or performed by CloudKit during background sync.
container.viewContext.transactionAuthor = appTransactionAuthorName

// So that when backgroundContext write to persistent store, container.viewContext will retrieve update from
// persistent store.
container.viewContext.automaticallyMergesChangesFromParent = true

// TODO: Not sure these are required...
//
//container.viewContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy
//container.viewContext.undoManager = nil
//container.viewContext.shouldDeleteInaccessibleFaults = true

// Observe Core Data remote change notifications.
initStoreRemoteChangeNotification(container)

return container
}()

private(set) lazy var backgroundContext: NSManagedObjectContext = {
precondition(Thread.isMainThread)

let backgroundContext = persistentContainer.newBackgroundContext()

// Provide transaction author name, so that we can know whether this DB transaction is performed by our app
// locally, or performed by CloudKit during background sync.
backgroundContext.transactionAuthor = appTransactionAuthorName

// Similar behavior as Android's Room OnConflictStrategy.REPLACE
// Old data will be overwritten by new data if index conflicts happen.
backgroundContext.mergePolicy = NSMergeByPropertyObjectTrumpMergePolicy

// TODO: Not sure these are required...
//backgroundContext.undoManager = nil

return backgroundContext
}()

private func initStoreRemoteChangeNotification(_ container: NSPersistentContainer) {
// Observe Core Data remote change notifications.
NotificationCenter.default.addObserver(
self,
selector: #selector(storeRemoteChange(_:)),
name: .NSPersistentStoreRemoteChange,
object: container.persistentStoreCoordinator
)
}

private func deinitStoreRemoteChangeNotification() {
NotificationCenter.default.removeObserver(self)
}

@objc func storeRemoteChange(_ notification: Notification) {
// Process persistent history to merge changes from other coordinators.
historyQueue.addOperation {
self.processPersistentHistory()
}
}

/**
Process persistent history, posting any relevant transactions to the current view.
*/
private func processPersistentHistory() {
backgroundContext.performAndWait {

// Fetch history received from outside the app since the last token
let historyFetchRequest = NSPersistentHistoryTransaction.fetchRequest!
historyFetchRequest.predicate = NSPredicate(format: "author != %@", appTransactionAuthorName)
let request = NSPersistentHistoryChangeRequest.fetchHistory(after: lastHistoryToken)
request.fetchRequest = historyFetchRequest

let result = (try? backgroundContext.execute(request)) as? NSPersistentHistoryResult
guard let transactions = result?.result as? [NSPersistentHistoryTransaction] else { return }

if transactions.isEmpty {
return
}

for transaction in transactions {
if let changes = transaction.changes {
for change in changes {
let entity = change.changedObjectID.entity.name
let changeType = change.changeType
let objectID = change.changedObjectID

if entity == "NSTabInfo" && changeType == .insert {
deduplicateNSTabInfo(objectID)
}
}
}
}

// Update the history token using the last transaction.
lastHistoryToken = transactions.last!.token
}
}

private func deduplicateNSTabInfo(_ objectID: NSManagedObjectID) {
do {
guard let nsTabInfo = try backgroundContext.existingObject(with: objectID) as? NSTabInfo else { return }

let uuid = nsTabInfo.uuid

guard let nsTabInfos = NSTabInfoRepository.INSTANCE.getNSTabInfosInBackground(uuid) else { return }

if nsTabInfos.isEmpty {
return
}

var bestNSTabInfo: NSTabInfo? = nil

for nsTabInfo in nsTabInfos {
if let _bestNSTabInfo = bestNSTabInfo {
if nsTabInfo.syncedTimestamp > _bestNSTabInfo.syncedTimestamp {
bestNSTabInfo = nsTabInfo
}
} else {
bestNSTabInfo = nsTabInfo
}
}

for nsTabInfo in nsTabInfos {
if nsTabInfo === bestNSTabInfo {
continue
}

// Remove old duplicated data!
backgroundContext.delete(nsTabInfo)
}

RepositoryUtils.saveContextIfPossible(backgroundContext)
} catch {
error_log(error)
}
}
}


Reference

  1. https://developer.apple.com/documentation/coredata/synchronizing_a_local_store_to_the_cloud - In the sample code, the file CoreDataStack.swift illustrate a similar example, on how to remove duplicated data after cloud sync.
  2. https://developer.apple.com/documentation/coredata/consuming_relevant_store_changes - Information on transaction histories.
  3. What's the best approach to prefill Core Data store when using NSPersistentCloudKitContainer? - A similar question

How to pre-load Core Data with a SQLite file that have references to images that were saved using external storage ?

Step 1: Create "MyAppSeedData" dir and paste MyApp.sqlite, the MyApp_SUPPORT, the MyApp.sqilte-smh, MyApp.sqilte-wal files inside.

Step 2: Drag MyAppSeedData to the bundle under AppDelegate and tick the box add target.

Step 3: These functions must be in AppDelegate file:

func application(_ application: UIApplication, didFinishLaunchingWithOptions launchOptions: [UIApplication.LaunchOptionsKey: Any]?) -> Bool
{
//If first launch condition == true {
seedData()
//}
return true
}


func seedData() {
let fm = FileManager.default

//Destination URL: Application Folder
let libURL = fm.urls(for: .libraryDirectory, in: .userDomainMask).first!
let destFolder = libURL.appendingPathComponent("Application Support").path
//Or
//let l1 = NSSearchPathForDirectoriesInDomains(.applicationSupportDirectory, .userDomainMask, true).last!
//

//Starting URL: MyAppSeedData dir
let folderPath = Bundle.main.resourceURL!.appendingPathComponent("MyAppSeedData").path

let fileManager = FileManager.default
let urls = fileManager.urls(for: .applicationSupportDirectory, in: .userDomainMask)
if let applicationSupportURL = urls.last {
do{
try fileManager.createDirectory(at: applicationSupportURL, withIntermediateDirectories: true, attributes: nil)
}
catch{
print(error)
}
}
copyFiles(pathFromBundle: folderPath, pathDestDocs: destFolder)
}

func copyFiles(pathFromBundle : String, pathDestDocs: String) {
let fm = FileManager.default
do {
let filelist = try fm.contentsOfDirectory(atPath: pathFromBundle)
let fileDestList = try fm.contentsOfDirectory(atPath: pathDestDocs)

for filename in fileDestList {
try FileManager.default.removeItem(atPath: "\(pathDestDocs)/\(filename)")
}

for filename in filelist {
try? fm.copyItem(atPath: "\(pathFromBundle)/\(filename)", toPath: "\(pathDestDocs)/\(filename)")
}
} catch {
print("Error info: \(error)")
}
}

// MARK: - Core Data stack

lazy var persistentContainer: NSPersistentContainer = {
let modelName = "MyApp"

var container: NSPersistentContainer!

container = NSPersistentContainer(name: modelName)

container.loadPersistentStores(completionHandler: { (storeDescription, error) in
if let error = error as NSError? {
fatalError("Unresolved error \(error), \(error.userInfo)")
}
})
return container
}()

What's the best practice to store complex data structure in core data on iOS?

Short answer, no, that would not be the best solution. If you don't want to store the object with a data model you create then Core Data will give you very little benefit. You'll basically have a bunch of opaque objects that you won't be able to distinguish until you deserialize its json string. If that's sufficient for your needs, then I'd recommend just archiving the objects to disk and skipping the overhead of Core Data.

What's the best way to store data for different lists?

Core data is fine. Another option is to use a local database in your Document folder. Here is a tutorial for sqlite database. You can also use Github libraries.

You can also save the data in a server so that user wont lose it. You can either create your own RESTful server, or use a free popular one called FireBase

What's the best way to store data for different lists?

Core data is fine. Another option is to use a local database in your Document folder. Here is a tutorial for sqlite database. You can also use Github libraries.

You can also save the data in a server so that user wont lose it. You can either create your own RESTful server, or use a free popular one called FireBase

What's the Best Way to Simulate an Array Type Attribue in Core Data?

Always think of Core Data as an object graph and model your data accordingly.

You should have a Contact entity and an Email entity. The email should be on the other end of a one-to-many bi-directional relationship with Contact. If you care about a specific order then you should also have some orderable value in the Email entity for later sorting.



Related Topics



Leave a reply



Submit