Ios – Implementing Fast and Efficient Core Data Import on iOS 5

core-dataiosnsfetchedresultscontrollernsmanagedobjectcontext

Question: How do I get my child context to see changes persisted on the parent context so that they trigger my NSFetchedResultsController to update the UI?

Here's the setup:

You've got an app that downloads and adds lots of XML data (about 2 million records, each roughly the size of a normal paragraph of text) The .sqlite file becomes about 500 MB in size. Adding this content into Core Data takes time, but you want the user to be able to use the app while the data loads into the data store incrementally. It's got to be invisible and imperceptible to the user that large amounts of data are being moved around, so no hangs, no jitters: scrolls like butter. Still, the app is more useful, the more data is added to it, so we can't wait forever for the data to be added to the Core Data store. In code this means I'd really like to avoid code like this in the import code:

[[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.25]];

The app is iOS 5 only so the slowest device it needs to support is an iPhone 3GS.

Here are the resources I've used so far to develop my current solution:

Apple's Core Data Programming Guide: Efficiently Importing Data

  • Use Autorelease Pools to keep the memory down
  • Relationships Cost. Import flat, then patch up relationships at the end
  • Don't query if you can help it, it slows things down in an O(n^2) manner
  • Import in Batches: save, reset, drain and repeat
  • Turn off the Undo Manager on import

iDeveloper TV – Core Data Performance

  • Use 3 Contexts: Master, Main and Confinement context types

iDeveloper TV – Core Data for Mac, iPhone & iPad Update

  • Running saves on other queues with performBlock makes things fast.
  • Encryption slows things down, turn it off if you can.

Importing and Displaying Large Data Sets in Core Data by Marcus Zarra

  • You can slow down the import by giving time to the current run loop,
    so things feel smooth to the user.
  • Sample Code proves that it is possible to do large imports and keep the UI responsive, but not as fast as with 3 contexts and async saving to disk.

My Current Solution

I've got 3 instances of NSManagedObjectContext:

masterManagedObjectContext – This is the context that has the NSPersistentStoreCoordinator and is responsible for saving to disk. I do this so my saves can be asynchronous and therefore very fast. I create it on launch like this:

masterManagedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSPrivateQueueConcurrencyType];
[masterManagedObjectContext setPersistentStoreCoordinator:coordinator];

mainManagedObjectContext – This is the context the UI uses everywhere. It is a child of the masterManagedObjectContext. I create it like this:

mainManagedObjectContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSMainQueueConcurrencyType];
[mainManagedObjectContext setUndoManager:nil];
[mainManagedObjectContext setParentContext:masterManagedObjectContext];

backgroundContext – This context is created in my NSOperation subclass that is responsible for importing the XML data into Core Data. I create it in the operation's main method and link it to the master context there.

backgroundContext = [[NSManagedObjectContext alloc] initWithConcurrencyType:NSConfinementConcurrencyType];
[backgroundContext setUndoManager:nil];
[backgroundContext setParentContext:masterManagedObjectContext];

This actually works very, VERY fast. Just by doing this 3 context setup I was able to improve my import speed by over 10x! Honestly, this is hard to believe. (This basic design should be part of the standard Core Data template…)

During the import process I save 2 different ways. Every 1000 items I save on the background context:

BOOL saveSuccess = [backgroundContext save:&error];

Then at the end of the import process, I save on the master/parent context which, ostensibly, pushes modifications out to the other child contexts including the main context:

[masterManagedObjectContext performBlock:^{
   NSError *parentContextError = nil;
   BOOL parentContextSaveSuccess = [masterManagedObjectContext save:&parentContextError];
}];

Problem: The problem is that my UI will not update until I reload the view.

I have a simple UIViewController with a UITableView that is being fed data using a NSFetchedResultsController. When the Import process completes, the NSFetchedResultsController see's no changes from the parent/master context and so the UI doesn't automatically update like I'm used to seeing. If I pop the UIViewController off the stack and load it again all the data is there.

Question: How do I get my child context to see changes persisted on the parent context so that they trigger my NSFetchedResultsController to update the UI?

I have tried the following which just hangs the app:

- (void)saveMasterContext {
    NSNotificationCenter *notificationCenter = [NSNotificationCenter defaultCenter];    
    [notificationCenter addObserver:self selector:@selector(contextChanged:) name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];

    NSError *error = nil;
    BOOL saveSuccess = [masterManagedObjectContext save:&error];

    [notificationCenter removeObserver:self name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];
}

- (void)contextChanged:(NSNotification*)notification
{
    if ([notification object] == mainManagedObjectContext) return;

    if (![NSThread isMainThread]) {
        [self performSelectorOnMainThread:@selector(contextChanged:) withObject:notification waitUntilDone:YES];
        return;
    }

    [mainManagedObjectContext mergeChangesFromContextDidSaveNotification:notification];
}

Best Solution

You should probably save the master MOC in strides as well. No sense having that MOC wait until the end to save. It has its own thread, and it will help keep memory down as well.

You wrote:

Then at the end of the import process, I save on the master/parent context which, ostensibly, pushes modifications out to the other child contexts including the main context:

In your configuration, you have two children (the main MOC and the background MOC), both parented to the "master."

When you save on a child, it pushes the changes up into the parent. Other children of that MOC will see the data the next time they perform a fetch... they are not explicitly notified.

So, when BG saves, its data is pushed to MASTER. Note, however, that none of this data is on disk until MASTER saves. Furthermore, any new items will not get permanent IDs until the MASTER saves to disk.

In your scenario, you are pulling the data into the MAIN MOC by merging from the MASTER save during the DidSave notification.

That should work, so I'm curious as to where it is "hung." I will note, that you are not running on the main MOC thread in the canonical way (at least not for iOS 5).

Also, you probably only are interested in merging changes from the master MOC (though your registration looks like it is only for that anyway). If I were to use the update-on-did-save-notification, I'd do this...

- (void)contextChanged:(NSNotification*)notification {
    // Only interested in merging from master into main.
    if ([notification object] != masterManagedObjectContext) return;

    [mainManagedObjectContext performBlock:^{
        [mainManagedObjectContext mergeChangesFromContextDidSaveNotification:notification];

        // NOTE: our MOC should not be updated, but we need to reload the data as well
    }];
}

Now, for what may be your real issue regarding the hang... you show two different calls to save on the master. the first is well protected in its own performBlock, but the second is not (though you may be calling saveMasterContext in a performBlock...

However, I'd also change this code...

- (void)saveMasterContext {
    NSNotificationCenter *notificationCenter = [NSNotificationCenter defaultCenter];    
    [notificationCenter addObserver:self selector:@selector(contextChanged:) name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];

    // Make sure the master runs in it's own thread...
    [masterManagedObjectContext performBlock:^{
        NSError *error = nil;
        BOOL saveSuccess = [masterManagedObjectContext save:&error];
        // Handle error...
        [notificationCenter removeObserver:self name:NSManagedObjectContextDidSaveNotification object:masterManagedObjectContext];
    }];
}

However, note that the MAIN is a child of MASTER. So, it should not have to merge the changes. Instead, just watch for the DidSave on the master, and just refetch! The data is sitting in your parent already, just waiting for you to ask for it. That's one of the benefits of having the data in the parent in the first place.

Another alternative to consider (and I'd be interested to hear about your results -- that's a lot of data)...

Instead of making the background MOC a child of the MASTER, make it a child of the MAIN.

Get this. Every time the BG saves, it automatically gets pushed into the MAIN. Now, the MAIN has to call save, and then the master has to call save, but all those are doing is moving pointers... until the master saves to disk.

The beauty of that method is that the data goes from the background MOC straight into your applications MOC (then passes through to get saved).

There is some penalty for the pass-through, but all the heavy lifting gets done in the MASTER when it hits the disk. And if you kick those saves on the master with performBlock, then main thread just sends off the request, and returns immediately.

Please let me know how it goes!

Related Question