Thursday, March 31, 2016

Hudl Technique and Core Data: Two Worlds Collide

For the past year-and-a-half, I've been working on the Hudl Technique (formerly Ubersense; formerly SwingReader) iOS app. Hudl Technique is an app for recording, playing, analyzing, and sharing high-quality video for getting better at your sport of choice!

About a year ago, we migrated the app from using leveldb to using core data for on-device data storage. This helped out a ton with some performance issues we were seeing, which were especially bad for power users -- but as you can imagine, moving to core data came with some issues of its own.

I learned so much about core data from this experience, and I'd like to share some of that knowledge -- I'm going to describe how our core data stack works, issues we ran into and how we solved them, and several examples.

Core data rules

Real quick, let's go over core data threading rules: you may only access MO, MOC from a single thread. In practice, this means that whenever you use core data, you must be aware of what thread you’re on; there’s one right thread, and lots of wrong ones.

Later in this article, we'll talk about how to follow these rules, and how to determine if you're following them correctly.

Our core data stack

We have one managed object context (MOC) connected to the persistent store coordinator (PSC). We'll call this the master MOC. This is a long-lived MOC, meaning that it's created when the app starts up, and hangs around until the app is terminated. It uses the NSPrivateQueueConcurrencyType, so it doesn't block the main thread.

We have a second long-lived MOC which we call the main MOC. We call it this because it uses the NSMainQueueConcurrencyType, and it's intended to be used from code which interacts with the UI. This MOC is a child of the master MOC.

We also have methods for creating additional child MOCs. These MOCs can use either NSPrivateQueueConcurrencyType or NSMainQueueConcurrencyType -- depending on whether they're going to be used from UI code or not. These are also children of the master MOC, but unlike the main MOC, they're not long-lived. They're intended to be used and then quickly discarded. They can also be used as scratchpads: you get one of these MOCs, make some changes, then change your mind and wish to just discard your changes -- no problem, all you have to do is just lose your references to the MOC!

In summary, there's one master MOC, which is long-lived and points to the PSC. There's lots of child MOCs, and of those, just one -- the main MOC -- is long-lived. The master MOC is private, so client code isn't able to use the master MOC directly -- clients must either use the main MOC, or create a new child MOC.

Saving your changes

Let's talk about how information moves between MOCs. The default behavior in core data is that, when you save a child MOC, that only pushes the changes up to its parent MOC, but doesn't push those into the PSC. This is confusing at first, because you're saving your changes, but when you quit and restart the app, your data is gone! To deal with this, we have a little bit of magic that kicks in whenever you save a child MOC: saving a child MOC changes the master MOC, and whenever the master MOC gets changed, we automatically save it, which pushes the changes out to the PSC and ensures that they'll get persisted.

What about the main MOC? It's long-lived, but it's a sibling of the other child MOCs, so when they save, those changes don't go to the main MOC. Does it end up with stale, out-of-date data?

That's a good question, but the answer is no! We have an extra bit of magic that pushes changes from the master MOC down to the main MOC, whenever the master MOC is changed.

Let's review really quick: let's say you get a child MOC and save it -- what happens? Well, after you save, those changes get pushed up to the master MOC; then they get pushed up to the PSC and persisted by our first bit of magic; next, those changes get pushed down to the master MOC by our second bit of magic.

It's important to note that this is the only case in which changes go down! The rest of the time, changes go child->parent, parent->PSC -- but never parent->child. This means that if you get a child MOC (other than the main one), it won't receive updates when the master MOC changes. So it will get stale, and have out-of-date data, and potentially cause crashes if you ask it to fulfill faults for objects that have been deleted. But we don't want this to happen for the main MOC, so we push changes to it!

Choosing a MOC

When we want to use core data within our app, we have three different choices for MOCs: the main MOC, a main queue child MOC, and a background queue child MOC. How do we decide which one to use?

  • Are we dealing with data for the UI? If so, we should the main MOC or a main queue child MOC, so that we can safely use the MOC and its MOs from UI methods, since those will be called on the main thread.
  • Do we need to avoid blocking the main thread? If so, we should use a background queue child MOC.
  • Will we need to undo changes? If so, we should use a main queue child MOC, or a background queue child MOC, because undoing changes in those MOCs is as easy as getting rid of all our references to the MOC.
  • Will we need to hold on to the MOC for a long time? ("A long time" is a bit nebulous, but in practice, it means: "after acquiring the MOC, and before getting rid of it, is any other MOC saving changes?") If yes, we'll want to use a long-lived MOC so that its snapshot of the data doesn't become stale; in our core data setup, this means we'll need to use the main MOC.
  • Will the MOC need to get updates? If yes, we'll want to use the main MOC, since it's our only publicly accessible MOC which gets updates.

Usage

Once you've obtained a MOC from our core data stack, you can do any of the CRUD (Create, Read, Update, Delete) operations! Of course, you'll have to make sure that you're using managed objects and MOCs on the right thread. Here are three strategies for doing that:

  1. use main queue MOC + MOs from UI methods
  2. use NSManagedObjectContext's performBlock or performBlockAndWait helper methods
  3. pass MOC as final parameter to method; the method assumes it's called on the correct thread, and the caller assumes responsibility for ensuring that the method is called on the correct thread, using either strategy 1 or strategy 2
Make sure to wrap all access to MO and MOC in a performBlock or performBlockAndWait. Patterns for doing this: - local performBlock - pass MOC as parameter to method, make sure method gets called from within performBlock/performBlockAndWait Then: - save - pay attention to errors (not much you can do about them other than log, though) - get rid of MOC by letting go of references to it

Pointers

It's a good idea to enable multithreading assertions in XCode. This tells XCode to check that you're using core data on the right thread, helping you to spot and fix problems with your core data usage. You can enable the assertions by editing your scheme, and adding this this argument:

Once that's set, whenever you run your app from XCode -- whether on the simulator or on a device -- XCode will be running core data threading assertions. Don't worry: this won't affect the app's behavior when it's in the AppStore! It's purely a debugging aid.

Managed Objects

We use Mogenerator to help convert our xcdatamodel files into Objective C classes. There are a couple nice reasons to use it. First, it creates pairs of classes for each entity: _EntityName and EntityName. _EntityName is where it puts all the auto-generated boilerplate, and gets updated whenever your model changes. EntityName is where you put your code, and doesn't get whacked when your model changes. Second, it creates subclasses of NSManagedObjectID, one per entity. If your entity is EntityName, then it creates EntityNameID, and sets up your EntityName class so that its objectID property is of type NSManagedObjectID. This isn't an absolutely critical issue, but I do like the additional type checking and compiler support that it provides.

Examples

If you need to use an MO on multiple threads, you can't pass the object itself between threads. That's against core data's threading rules. Instead, you'll have to pass the objectID, and on each thread on which you need to use the object, you'll have to acquire a MOC and use that to get the MO.

Sometimes, you may need to get rid of your changes. If you've made your changes in a short-lived child context, this is easy: simply throw away the MOC without saving, and get rid of your references to the MOC.

Difficulties

We changed our core data setup to see if we could clear up an issue we were facing: fetch request batch sizes are ignored for child MOCs, and since we were trying to set the batch size on our main MOC, we decided to change it to point to the PSC instead of to the master MOC. This made it a sibling of the master MOC, instead of a child:

Unfortunately, this caused a gigantic number of core data crashes, each of which mentioned 'statement is still active' somewhere in the crash report. We never were able to figure out what the underlying problem was, so we ended up reverting this change.