iPhone: Efficiently loading data into memory
Intro
I've been working on an iPhone app for awhile, and early on I ran into a few problems loading data into my app efficiently. Given Apple's recent lifting of the NDA, I thought I'd share my experiences with other new developers. The application I'm working on involves looking up words to see if they appear in the Tournament Word List (TWL) to see if it's legit, and loading, storing, and performing lookups on the data is non-trivial. That's not to say getting _something_ working was problematic; in fact Cocoa and objective-C make it very easy to get a first pass going. However I need to look up tens of thousands of words per second, so optimization is very important to me.
First Pass
As a first pass, I put the word list in plist format with a dictionary in the root, and each word the the TWL as a key, and an empty string as the corresponding value. It turns out that python provides a useful module-- plistlib --to help make this easy.
1 import plistlib 2 3 wordlist = 4 5 root = 6 for word in wordlist: 7 = ' 8 9
Executing this script created the following file:
1 2 4 5 6 apple 7 8 banana 9 10 zoo 11 12 13 14
Now, to load the data within the app and check to see if apple is a legal word:
1 // For a large word list (200k words), this ends up 2 // taking a few seconds to load 3 NSString *filePath = ; 5 NSDictionary *words = ;
That wasn't so hard was it? In fact, if your data set is small, and you're not worried about performance, that could be all you need. Many applications, including mine, will need to do much better though.
A big improvement
There are two problems with this first pass.
- It takes a few seconds to load the data ==> takes awhile before your app can do useful things.
- 200k words take up quite a bit of memory as a plist file. One idea might be to use a binary plist instead of an ASCII encoded plist. Unfortunately, this doesn't help at all... it actually seems to just cause loading to take longer. If you run the app with the memory profiler, you will see the memory usage jump up to 3x the size of the ASCII plist file. Keep in mind that the ASCII plist file is already several times the size of the data set because of the inefficient encoding. This makes using plists impractical for data sets over a few hundred KB.
I've found that any bit of code that needs to be run over and over and is a performance (or memory) bottleneck is best written in C rather than objective-C.
- Create a newline, space, or null separated list of words instead of a plist
- Load the raw data in an NSData object.
- Use [NSData getBytes:] to grab chunks of the data and stick it in a character array.
- Use C string comparison functions to do word list lookups. One (not so good) option is to try doing a string compare starting at each byte of the data. If one of the millions of possible starting points returns a match, the word is legal. This shouldn't sound like a great idea to you. You should be able to improve this by 2 or 3 orders of magnitude for a list of 200k words. Perhaps more... I didn't bother testing something quite this bad.
1 // Loads significantly faster, and is much more memory 2 // efficient than using the built-in dictionary 3 NSString *filePath = ; 5 6 NSData *wordData = ; 8 9 // The downside? Checking for legal words is much 10 // trickier. I'll leave that to you, but one idea 11 // (not a particularly good one) is to do a linear 12 // search through each character in the array and 13 // see if it's equal to the string under consideration
Conclusion
The moral of the story? For big chunks of data or complex tasks I've found writing C or C++ code to be faster, and more memory efficient than using the 'easy' functions in objective-C. The obvious downside is that the C code is trickier to write, and trickier to debug. If you want your app do complex things quickly though it's the way to go. The rest of the app -- e.g. all of your user interface, and all of your code that isn't time-critical -- should be perfectly fine in objective-C.
cocoa,
development,
iphone,
nda,
programming,
sdk in
programming
Reader Comments