How to Guarantee Fetch Results from a Different Thread in a Nested Contexts Are Up to Date, When Saves Are Done Asynchronously in Background

How should I guarantee fetch results from a different thread in a nested contexts are up to date, when saves are done asynchronously in background?

The question isn't specific to core-data.

It's the classic read-write question.

The common approach with protecting a datasource is to access your datasource using a serial queue. Otherwise yeah without the serial queue you will have a read-write problem.

In the following example:

let coredataManager = CoreDataStack() // 1
coredataManager.saveMainContext() // 2 save is done asynchronously in background queue
coredataManager.mainManagedObjectedContext.fetch(fetchrequest) // 3

coredataManager is to be accessed from a serial queue. So even if the write in the 2nd line is done asynchronously, the read at line 3, will have to wait until the serial queue is unblocked.

How to fetch data with CoreData in the background?

I think there is no difference for the example you given(only fetch related).

A) always create a new context, which means it's not safe when multiple running for creating entity or fetch-or-create entity.

In a scenario of creating, you'd better use B), but need to hold the shared background context. when you do 'perform', all jobs running in one queue synchronously.

For C), NSAsynchronousFetchRequest shines when it works with viewContext, you don't have to create a child/background context. but it's not wrong to create one for it.

Coredata CRUD operations in background thread?

From Apple's Documentations Concurrency

performBlock: and performBlockAndWait: ensure the block operations are executed on the queue specified for the context. The performBlock: method returns immediately and the context executes the block methods on its own thread. With the performBlockAndWait: method, the context still executes the block methods on its own thread, but the method doesn’t return until the block is executed.

When you use NSPrivateQueueConcurrencyType, the context creates and manages a private queue.

So you do not need to create another dispatch queue if you use performBlock: because it is asynchronously executes operations within block. Otherwise, if you use performBlockAndWait: which wait until to finish it's operation execution and in this case you should use another dispatch queue.

Therefore, best practice is to avoid the use of another dispatch queue. e.g

NSManagedObjectContext *private = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[private setParentContext:mainMoc];

[private performBlock:^{
for (NSDictionary *jsonObject in jsonArray) {
NSManagedObject *mo = …; //Managed object that matches the incoming JSON structure
//update MO with data from the dictionary
}
NSError *error = nil;
if (![private save:&error]) {
NSLog(@"Error saving context: %@\n%@", [error localizedDescription], [error userInfo]);
abort();
}
[mainMoc performBlockAndWait:^{
NSError *error = nil;
if (![mainMoc save:&error]) {
NSLog(@"Error saving context: %@\n%@", [error localizedDescription], [error userInfo]);
abort();
}
}];
}];

What does awaiting an asynchronous method do in background?

I've red various articles about async await and i'm trying to understand the await async in depth.

A noble pursuit.

My problem is that i found out that awaiting an asyncronous method doesn't creat a new thread, it rather just make the UI responsive.

Correct. It is very important to realize that await means asynchronous wait. It does not mean "make this operation asynchronous". It means:

  • This operation is already asynchronous.
  • If the operation is complete, fetch its result
  • If the operation is not complete, return to the caller and assign the remainder of this workflow as the continuation of the incomplete operation.
  • When the incomplete operation becomes complete, it will schedule the continuation to execute.

If it's like that there's no time gain when using await async since no extra thread is used.

This is incorrect. You're not thinking about the time win correctly.

Imagine this scenario.

  • Imagine a world with no ATMs. I grew up in that world. It was a strange time. So there is usually a line of people at the bank waiting to deposit or withdraw money.
  • Imagine there is only one teller at this bank.
  • Now imagine that the bank only takes and gives out single dollar bills.

Suppose there are three people in line and they each want ten dollars. You join the end of the line, and you only want one dollar. Here are two algorithms:

  • Give the first person in the line one dollar.
  • [ do that ten times ]
  • Give the second person in the line one dollar.
  • [ do that ten times ]
  • Give the third person in the line one dollar.
  • [ do that ten times ]
  • Give you your dollar.

How long does everyone have to wait to get all their money?

  • Person one waits 10 time units
  • Person two waits 20
  • Person three waits 30
  • You wait 31.

That's a synchronous algorithm. An asynchronous algorithm is:

  • Give the first person in the line one dollar.
  • Give the second person in the line one dollar.
  • Give the third person in the line one dollar.
  • Give you your dollar.
  • Give the first person in the line one dollar.
  • ...

That's an asynchronous solution. Now how long does everyone wait?

  • Everyone getting ten dollars waits about 30.
  • You wait 4 units.

The average throughput for large jobs is lower, but the average throughput for small jobs is much higher. That's the win. Also, the time-to-first-dollar for everyone is lower in the asynchronous workflow, even if the time to last dollar is higher for big jobs. Also, the asynchronous system is fair; every job waits approximately (size of job)x(number of jobs). In the synchronous system, some jobs wait almost no time and some wait a really long time.

The other win is: tellers are expensive; this system hires a single teller and gets good throughput for small jobs. To get good throughput in the synchronous system, as you note, you need to hire more tellers which is expensive.

Is this also true for Task.WhenAll() or Task.WhenAny() ?

They do not create threads. They just take a bunch of tasks and complete when all/any of the tasks are done.

When creating the getStringTask Task, another thread will copy the current context and start executing the GetStringAsync method.

Absolutely not. The task is already asynchronous and since it is an IO task it doesn't need a thread. The IO hardware is already asynchronous. There is no new worker hired.

When awaiting getStringTask, we will see if the other thread has completed his task

No, there is no other thread. We see if the IO hardware has completed its task. There is no thread.

When you put a piece of bread in the toaster, and then go check your email, there is no person in the toaster running the toaster. The fact that you can start an asynchronous job and then go off and do other stuff while it is working is because you have special purpose hardware that is by its nature asynchronous. That's true of network hardware the same way it is true of toasters. There is no thread. There is no tiny person running your toaster. It runs itself.

if not the control will be back to the caller of AccessTheWebAsync() method until the other thread completes its task to resume the control.

Again, there is no other thread.

But the control flow is correct. If the task is complete then the value of the task is fetched. If it is not complete then control returns to the caller, after assigning the remainder of the current workflow as the continuation of the task. When the task is complete, the continuation is scheduled to run.

i really don't get how no extra thread is created when awaiting a Task.

Again, think about every time in your life when you stopped doing a task because you were blocked, did something else for a while, and then started up doing the first task again when you got unblocked. Did you have to hire a worker? Of course not. Yet somehow you managed to make eggs while the toast was in the toaster. Task based asynchrony just puts that real-world workflow into software.

It never ceases to amaze me how you kids today with your weird music act like threads always existed and there is no other way to do multitasking. I learned how to program in an operating system that didn't have threads. If you wanted two things to appear to happen at the same time, you had to build your own asynchrony; it wasn't built into the language or the OS. Yet we managed.

Cooperative single-threaded asynchrony is a return to the world as it was before we made the mistake of introducing threads as a control flow structure; a more elegant and far simpler world. An await is a suspension point in a cooperative multitasking system. In pre-threading Windows, you'd call Yield() for that, and we didn't have language support for creating continuations and closures; you wanted state to persist across a yield, you wrote the code to do it. You all have it easy!

Can someone explain what exactly happening when awaiting a Task ?

Exactly what you said, just with no thread. Check to see if the task is done; if it's done, you're done. If not, schedule the remainder of the workflow as the continuation of the task, and return. That's all await does.

I just want to confirm something. Is it always the case that there's no thread created when awaiting a task?

We worried when designing the feature that people would believe, as you still might, that "await" does something to the call which comes after it. It does not. Await does something to the return value. Again, when you see:

int foo = await FooAsync();

you should mentally see:

Task<int> task = FooAsync();
if (task is not already completed)
set continuation of task to go to "resume" on completion
return;
resume: // If we get here, task is completed
int foo = task.Result;

A call to a method with an await is not a special kind of call. The "await" does not spin up a thread, or anything like that. It is an operator that operates on the value that was returned.

So awaiting a task does not spin up a thread. Awaiting a task (1) checks to see if the task is complete, and (2) if it is not, assigns the remainder of the method as the continuation of the task, and returns. That's all. Await does not do anything to create a thread. Now, maybe the called method spins up a thread; that's it's business. That has nothing to do with the await, because the await doesn't happen until after the call returns. The called function does not know its return value is being awaited.

Let's say we await a CPU bound task that does heavy calculations. What i know so far is a I/O bound code it will be executed on low level CPU components (much lower than threads) and only use a thread briefly to notify the context about the finished Task status.

What we know about the call to FooAsync above is that it is asynchronous, and it returns a task. We do not know how it is asynchronous. That's the author of FooAsync's business! But there are three main techniques that the author of FooAsync can use to achieve asynchrony. As you note, the two main techniques are:

  • If the task is high-latency because it requires a long computation to be done on the current machine on another CPU, then it makes sense to obtain a worker thread and start the thread doing the work on another CPU. When the work is finished, the associated task can schedule its continuation to run back on the UI thread, if the task was created on the UI thread, or on another worker thread, as appropriate.

  • If the task is high-latency because it requires communication with slow hardware, like disks or networks, then as you note, there is no thread. Special-purpose hardware does the task asynchronously and the interrupt handling provided by the operating system ultimately takes care of getting the task completion scheduled on the right thread.

  • A third reason to be asynchronous is not because you're managing a high-latency operation, but because you're breaking up an algorithm into little parts and putting them on a work queue. Maybe you're making your own custom scheduler, or implementing an actor model system, or trying to do stackless programming, or whatever. There's no thread, there's no IO, but there is asynchrony.

So, again, awaiting does not make something run on a worker thread. Calling a method that starts a worker thread makes something run on a worker thread. Let the method you're calling decide whether to make a worker thread or not. Async methods are already asynchronous. You don't need to do anything to them to make them asynchronous. Await does not make anything asynchronous.

Await exists solely to make it easier for the developer to check whether an asynchronous operation has completed, and to sign up the remainder of the current method as the continuation if it has not completed. That's what it is for. Again, await does not create asynchrony. Await helps you build asynchronous workflows. An await is a point in the workflow where an asynchronous task must be completed before the workflow can continue.

I also know that we use Task.Run() to execute CPU bound code to look for an available thread in thread pool. Is this true ?

That's correct. If you have a synchronous method, and you know that it is CPU bound, and you would like it to be asynchronous, and you know that the method is safe to run on another thread, then Task.Run will find a worker thread, schedule the delegate to be executed on the worker thread, and give you a task representing the asynchronous operation. You should only do this with methods that are (1) very long-running, like, more than 30 milliseconds, (2) CPU bound, (3) safe to call on another thread.

If you violate any of those, bad things happen. If you hire a worker to do less than 30 milliseconds of work, well, think about real life. If you have some computations to do, does it make sense to buy an ad, interview candidates, hire someone, get them to add three dozen numbers together, and then fire them? Hiring a worker thread is expensive. If hiring the thread is more expensive than just doing the work yourself, you will not get any performance win at all by hiring a thread; you'll make it a lot worse.

If you hire a worker to do IO bound tasks, what you've done is hired a worker to sit by the mailbox for years and yell when mail arrives. That does not make the mail arrive faster. It just wastes worker resources that could be spent on other problems.

And if you hire a worker to do a task that is not threadsafe, well, if you hire two workers and tell them to both drive the same car to two different locations at the same time, they're going to crash the car while they're fighting over the steering wheel on the freeway.

UIManagedDocument with NSFetchedResultsController and background context

I read about a similar problem on the Apple dev forums today. Perhaps this is the same problem as yours, https://devforums.apple.com/message/666492#666492, in which case perhaps there is a bug (or at least someone else with the same issue to discuss it with!).

Assuming it isn't, it sounds like what you want to do should be perfectly possible with nested contexts, and therefore assuming no bugs with UIManagedDocument.

My only reservation is that I've been trying to get batch loading working with UIManagedDocument and it seems like it does not work with nested contexts (https://stackoverflow.com/q/11274412/1347502). I would think one of the main benefits of NSFetchedResultsController is it's ability to improve performance through batch loading. So if this can't be done in UIManagedDocument perhaps NSFetchedResultsController isn't ready for use with UIManagedDocument but I haven't got to the bottom of that issue yet.

That reservation aside, most of the instruction I've read or viewed about nested contexts and background work seems to be done with peer child contexts. What you have described is a parent, child, grandchild configuration. In the WWDC 2012 video "Session 214 - Core Data Best Practices" (+ 16:00 minutes) Apple recommend adding another peer context to the parent context for this scenario, e.g

backgroundContext.parentContext = document.managedObjectContext.parentContext;

The work is performed asynchronously in this context and then pushed up to the parent via a call to save on the background context. The parent would then be saved asynchronously and any peer contexts, in this case the document.managedObjectContext, would access the changes via a fetch, merge, or refresh. This is also described in the UIManagedDocument documentation:

  • If appropriate, you can load data from a background thread directly
    to the parent context. You can get the parent context using
    parentContext. Loading data to the parent context means you do not
    perturb the child context’s operations. You can retrieve data loaded
    in the background simply by executing a fetch.

[Edit: re-reading this it could just be recommending Jeffery's suggestion i.e. not creating any new contexts at all and just using the parent context.]

That being said the documentation also suggests that typically you do not call save on child contexts but use the UIManagedDocument's save methods. This may be an occasion when you do call save or perhaps part of the problem. Calling save on the parent context is more strongly discouraged, as mentioned by Jeffery. Another answer I've read on stack overflow recommended only using updateChangeCount to trigger UIManagedDocument saves. But I've not read any thing from Apple, so perhaps in this case a to call the UIManagedDocument saveToURL:forSaveOperation:completionHandler: method would be appropriate to get everything in sync and saved.

I guess the next obvious issue is how to notify NSFetchedResultsController that changes have occurred. I would be tempted to simplify the setup as discussed above and then subscribe to the various NSManagedObjectContextObjectsDidChangeNotification or save notifications on the different contexts and see which, if any, are called when UIMangedDocument saves, autosaves, or when background changes are saved to the parent (assuming that is allowable in this case). I assume the NSFetchedResultsController is wired to these notifications in order to keep in sync with the underlying data.

Alternatively perhaps you need to manually perform a fetch, merge, or refresh in the main context to get the changes pulled through and then somehow notify NSFetchedResultsController that it needs to refresh?

Personally I'm wondering if UIManagedDocument is ready for general consumption, there was no mention of it at WWDC this year and instead a lengthy discussion of how to build a much more complicated solution was presented: "Session 227 - Using iCloud with Core Data"

Core Data: should I be fetching objects from the parent context or does the child context have the same objects as the parent?

It does not seem to make too much sense to create a child context and then fetch from the parent context. I do not believe that this is the way child contexts' blocks were conceived to be used.

To clear up the confusion: after creating the child context from a parent context, that child context has the same "state" as the parent. Only if the two contexts do different things (create, modify, delete objects) the content of the two contexts will diverge.

So for your setup, proceed as follows:

  • create the child context
  • do the work you want to do (modifying or creating objects from the downloaded data),
  • save the child context

At this stage, nothing is saved to the persistent store yet. With the child save, the changes are just "pushed up" to the parent context. You can now

  • save the parent context

to write the new data to the persistent store. Then

  • update your UI,

best via notifications (e.g. the NSManagedObjectContextDidSaveNotification).

How to get the return value from a thread?

In Python 3.2+, stdlib concurrent.futures module provides a higher level API to threading, including passing return values or exceptions from a worker thread back to the main thread:

import concurrent.futures

def foo(bar):
print('hello {}'.format(bar))
return 'foo'

with concurrent.futures.ThreadPoolExecutor() as executor:
future = executor.submit(foo, 'world!')
return_value = future.result()
print(return_value)


Related Topics



Leave a reply



Submit