20

Usually, a "mutable array" class is implemented as a wrapper around a simple array. The wrapper allocates more memory when you add an element past the end. This is a common data structure and the performance of the various operations is well known. You get O(1) element access, O(N) insert and remove, or O(1) (on average) insert and remove at the end of the array. But NSMutableArray is something else. For example the docs say [emphasis mine]:

Note: Most operations on an array take constant time: accessing an element, adding or removing an element at either end, and replacing an element. Inserting an element into the middle of an array takes linear time.

So, what exactly is NSMutableArray? Is this documented somewhere?

3
  • AFAIK, it's not officially documented. Generally one of two approaches is used: Resize and copy, or multiple arrays with each covering part of the range. And I would assume that Apple does not want to commit to a particular design officially, in order to give them the ability to redesign. Commented Mar 23, 2014 at 13:10
  • The best chance to see any implementation code is searching for CFLite code from apple. There, at least, you can find CFArray. Commented Mar 23, 2014 at 13:14
  • Just guessing: The mutable Array is implemented in a way, that there is some free memory at the end AND at the beginning of the memory block, that the array uses. That is how it only takes constant time to insert at the beginning. But they also write that it is actually not only one class, so I also guess for different sizes there are different optimizations Commented Mar 23, 2014 at 13:16

2 Answers 2

27

It's a wrapper around a circular buffer.

This is neither documented nor open-sourced, but this blog post shows an amazing reverse-engineer job over NSMutableArray, which I think you'll find very interesting.

The NSMutableArray class cluster is backed by a concrete private subclass called __NSArrayM.

The greatest discovery is that NSMutableArray is not a thin wrapper around a CFArray, as one may reasonably think: CFArray is open-sourced and it doesn't use a circular buffer, whereas __NSArrayM does.

Reading through the comments of the article, it appears that it started to be this way since iOS 4, whereas in previous SDKs NSMutableArray actually used CFArray internally and __NSArrayM wasn't even there.

Straight from the blog post I mentioned above

Data Structure

As you might have guessed, __NSArrayM makes use of circular buffer. This data structure is extremely simple, but a little bit more sophisticated than regular array/buffer. The contents of circular buffer can wrap around when either end is reached.

Circular buffer has some very cool properties. Notably, unless the buffer is full, insertion/deletion from either end doesn’t require any memory to be moved.

The pseudo-code for objectAtIndex: goes as follows:

- (id)objectAtIndex:(NSUInteger)index {
    if (_used <= index) {
        goto ThrowException;
    }

    NSUInteger fetchOffset = _offset + index;
    NSUInteger realOffset = fetchOffset - (_size > fetchOffset ? 0 : _size);

    return _list[realOffset];

ThrowException:
    // exception throwing code
}

where the ivars are defined as

  • _used: the number of elements the array holds
  • _list: the pointer to the circular buffer
  • _size: the size of the buffer
  • _offset: the index of first element of array in the buffer

Again, I don't take any credit for all the information above, as they come straight from this amazing blog post by Bartosz Ciechanowski.

Sign up to request clarification or add additional context in comments.

12 Comments

@MartinR quoting the author of the blog post I cite: I was shocked to realize neither NSArray nor NSMutableArray have anything in common with CFArray
@MartinR most notably, CFArray doesn't use a circular buffer.
@gnasher729 of course. If you look at the article I linked, it does some testing as well, and the result is that insertion and delete form either end of a NSMutableArray are performed in constant time, making it suitable for being used as a queue.
@GabrielePetronella "CFArray s a C data structure, so it cannot be a subclass of NSArray" - the two are toll-free bridged for a reason. Objective-C objects and classes are "just" C structs anyway. At least the "public" part of the layout of the two types is identical. Also, try running this -- I've got NSMutableArray, so CFArray on my Mac is a concrete subclass of NSMutableArray. This does not mean that their implementation cannot differ, but still, beware your wording.
I recall bbum mentioning in a comment here on SO that, surprisingly, the "bridged to" direction has shifted in the last few years -- some? most? of CF is now in fact implemented in ObjC. I'm not clear on how that squares up with the open-source CF, however.
|
2

Did some measurements: Starting with an empty array, added @"Hello" 100,000 times, then removed it 100,000 times. Different patterns: Adding/removing at the end, at the start, in the middle, close to the start (at index 20 when possible), close to the end (20 indexes away from the end when possible), and one where I alternated between close to the start and the end. Here's the times for 100,000 objects (measured on Core 2 Duo):

Adding objects = 0.006593 seconds
Removing objects at the end = 0.004674 seconds
Adding objects at the start = 0.003577 seconds
Removing objects at the start = 0.002936 seconds
Adding objects in the middle = 3.057944 seconds
Removing objects in the middle = 3.059942 seconds
Adding objects close to the start = 0.010035 seconds
Removing objects close to the start = 0.007599 seconds
Adding objects close to the end = 0.008005 seconds
Removing objects close to the end = 0.008735 seconds
Adding objects close to the start / end = 0.008795 seconds
Removing objects close to the start / end = 0.008853 seconds

So the time for each add / remove is proportional the distance to the beginning or end of the array, whichever is closer. Adding things in the middle is expensive. You don't have to work exactly at the end; removing elements close to the start / end is also quite cheap.

The suggested implementation as a circular list omits an important detail: There is a gap of variable size between the location of the last and the first array element. As array elements are added / removed, the size of that gap changes. More memory needs to be allocated and object pointers need to be moved when the gap disappears and more objects are added; the array can be shrunk and object pointers need to be moved when the gap becomes too large. A simple change (allowing the gap to be located at any location, not just between last and first element) would allow changes at any location to be fast (as long as it is the same location), and would make operations that "thin out" the array faster.

5 Comments

Your results confirm what is stated in the documentation, but that was not the question (as I understand it).
This is perfectly coherent with the specification. Constant time insertion/deletion at both ends, is obviously payed with bad insertion/deletion performance in the middle. That been said, I don't see how this answers the original question.
So why was that questioned asked? Idle curiosity? Or to find out about performance characteristics? The golden rule of optimisation: It doesn't count if you haven't measured it. BTW. There is no good reason to assume that the implementation in MacOS X or iOS is the same as the open sourced implementation. It's also not exactly a circular buffer, because a circular buffer could easily handle insertion / removal at the same place anywhere in the buffer, for example in the middle.
@gnasher729 the question is what's the data structure behind NSMutableArray. The circular buffer comes from reverse-engineering of the actual implementation and not from open-source code (which is not available for Foundation), so nobody here made that assumption.
The suggested implementation as a circular list. Nope. Circular List and Circular Buffer are not the same thing. first is a linked list, where the last element points to the first. while a buffer is a chunk of memory and in a circular buffer the offset defines where the data starts.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.