Getting Loaded

Tuesday, November 28, 2006 at 10:04 AM

Posted by: Dave MacLachlan, Member of Technical Staff, Mac Team

Categories are a really interesting feature of Objective-C, especially for those of us who came from C++. Not only do categories allow you to extend other classes for which you may be lacking the source, but they also give you a really simple way of hiding interfaces from your clients without all the baggage of a pImpl pattern. We use categories a lot here at Google to "enhance" the system frameworks.

We do run into a couple of major problems with categories, though. One issue is about class-specific initialization. Objective-C has traditionally had three solutions for class-specific initialization:
  1. lazy initialization
  2. +initialize
  3. +load
With lazy initialization, we call our initialization routine as necessary to make sure our class-specific stuff is initialized before we use it. This works fine except that in implementing it, we often end up writing code where every method starts with a method call to the initialization check, which is ugly and a potential source of stupid bugs.

+initialize is the "best practice" for initializing class variables. For a standard class, it works great. This technique usually does almost everything we want it to do and has reasonably clear semantics, although the part that says "you could be invoked more than once" is a bit of a pain. The problem with +initialize is that it is virtually useless for categories, in that if I override a class's initialize method in my category, I can't call the original initialize. Also, if I have two categories on a class, and both have initialize methods, it is unclear which one will be called. Even if it works now, there's no guarantee the original framework you're extending won't be "enhanced" and break you in the future when an initialize method is added to the class you extended. So, as far as categories are concerned, +initialize is pretty much useless. Interestingly if you think about it, in some ways +initialize is very much like lazy initialization, with the details hidden under the covers. Basically, you have the runtime doing the check instead of you having to code it.

+load is an interesting option, and probably one of the less understood areas of how the Objective-C runtime actually works. According to the NSObject documentation:
  1. +load is invoked once per class or category.
  2. +load is usually invoked before +initialize, but not always.
  3. You can't be sure your superclasses are loaded.
  4. You can't be sure any other classes are loaded.
  5. You know, you really should be using +initialize.
Item 1 is great, but there are a lot of interesting little caveats that follow it. Basically, the docs say you can't call ANY other classes, including your superclass, safely from within a +load. Looking at the obj-c runtime code (ADC registration required), we can get a bit more information on how +load works. We see that we are guaranteed that our superclasses are "+load"ed before we are "+load"ed, but we can't call out to other classes, so officially we can't use NSDictionary, NSString, et al. In practice, this appears to work pretty much all the time, but I certainly wouldn't intentionally ship code that depended on it.

So it appears that if we stay with traditional Objective-C, we are basically stuck with lazy initialization for doing class-specific initialization in a category. Luckily, if we break with tradition, we can use the "constructor" attribute. Yes, the syntax is ugly, but it does potentially solve a lot of our problems.

Constructors (which is a horrible name that must have been intentionally designed to cause confusion with C++/Java constructors) are guaranteed to be called after +load but before main. This gets rid of several of the caveats of +load, because you can be guaranteed that all your classes are loaded and that +initialize will be invoked on classes as needed by your constructor function.

So doing something like the following gives us a nice, relatively clean way of class-specific initialization in a category.

@implementation Foo(FooBarAdditions)
....
@end

@interface Foo(FooBarAdditionsPrivateMethods)
+ (void)initializeBar;
@end

@implementation Foo(FooBarAdditionsPrivateMethods)
+ (void)initializeBar {
// Initialize stuff here
}
@end

void __attribute__ ((constructor)) InitializeFooBar(void) {
static BOOL wasInitialized = NO;
if (!wasInitialized) {
// safety in case we get called twice.
[Foo initializeBar];
wasInitialized = YES;
}
}

@synchronized swimming Feedback

Thanks to reader Ron Avitzur for pointing out that the nice Objective-C workaround for the DCLP that I showed in a recent post is officially bad. He's got a nice writeup on his blog about why even though it's pretty, it's nothing more than a global flag, which in theory doesn't get us around the problem at all.