3

Apple offers a CPU and GPU Synchronization sample project that shows how to synchronize access to shared resources between CPU and GPU. To do so, it uses a semaphore which is stored in an instance variable:

@implementation AAPLRenderer
{
  dispatch_semaphore_t _inFlightSemaphore;
  // other ivars
}

This semaphore is then defined in another method:

- (nonnull instancetype)initWithMetalKitView:(nonnull MTKView *)mtkView
{
    self = [super init];
    if(self)
    {
        _device = mtkView.device;

        _inFlightSemaphore = dispatch_semaphore_create(MaxBuffersInFlight);

        // further initializations
    }

    return self;
}

MaxBuffersInFlight is defined as follows:

// The max number of command buffers in flight
static const NSUInteger MaxBuffersInFlight = 3;

Finally, the semaphore is utilized as follows:

/// Called whenever the view needs to render
- (void)drawInMTKView:(nonnull MTKView *)view
{
    // Wait to ensure only MaxBuffersInFlight number of frames are getting processed
    //   by any stage in the Metal pipeline (App, Metal, Drivers, GPU, etc)
    dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);

    // Iterate through our Metal buffers, and cycle back to the first when we've written to MaxBuffersInFlight
    _currentBuffer = (_currentBuffer + 1) % MaxBuffersInFlight;

    // Update data in our buffers
    [self updateState];

    // Create a new command buffer for each render pass to the current drawable
    id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
    commandBuffer.label = @"MyCommand";

    // Add completion hander which signals _inFlightSemaphore when Metal and the GPU has fully
    //   finished processing the commands we're encoding this frame.  This indicates when the
    //   dynamic buffers filled with our vertices, that we're writing to this frame, will no longer
    //   be needed by Metal and the GPU, meaning we can overwrite the buffer contents without
    //   corrupting the rendering.
    __block dispatch_semaphore_t block_sema = _inFlightSemaphore;
    [commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
    {
        dispatch_semaphore_signal(block_sema);
    }];

    // rest of the method
}

What I fail to understand here is the necessity of the line

__block dispatch_semaphore_t block_sema = _inFlightSemaphore;

Why do I have to copy the instance variable into a local variable and mark this local variable with __block. If I just drop that local variable and instead write

[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
{
    dispatch_semaphore_signal(_inFlightSemaphore);
}];

It seems to work as well. I also tried to mark the instance variable with __block as follows:

__block dispatch_semaphore_t _bufferAccessSemaphore;

This compiles with Clang and seems to work as well. But because this is about preventing race conditions I want to be sure that it works.

So the question is why does Apple create that local semaphore copy marked with __block? Is it really necessary or does the approach with directly accessing the instance variable work just as well?

As a side note, the answer to this SO question remarks that marking instance variables with __block can't be done. The answer is according to gcc but why would Clang allow this if it shouldn't be done?

1
  • 1
    It should work without __block I would think. The main thing is to avoid the reference to self->_inFlightSemaphore inside the block, which would strongly retain self instead of just the semaphore, which the local variable solves. __block on instance variables should be meaningless since they are always accessed indirectly through self->xxx; perhaps clang just hasn't bothered to make the declaration illegal there even though it has no effect. Commented Mar 19, 2019 at 16:31

2 Answers 2

3

The important semantic distinction here is that when you use the ivar directly in the block, the block takes a strong reference to self. By creating a local variable that refers to the semaphore instead, only the semaphore is captured (by reference) by the block, instead of self, reducing the likelihood of a retain cycle.

As for the __block qualifier, you'd normally use that to indicate that a local variable should be mutable within the referencing block. However, since the semaphore variable is not mutated by the call to signal, the qualifier isn't strictly necessary here. It still serves a useful purpose from a style perspective, though, in the sense that it emphasizes the lifetime and purpose of the variable.

On the topic of why an ivar can be qualified with __block,

why would Clang allow this if it shouldn't be done?

Perhaps exactly because capturing an ivar in a block implies strongly capturing self. With or without a __block qualifier, if you use an ivar in a block, you're potentially running the risk of a retain cycle, so having the qualifier there doesn't create additional risk. Better to use a local variable (which, by the way, could be a __weak reference to self just as easily as a __block-qualified reference to an ivar) to be explicit and safe.

Sign up to request clarification or add additional context in comments.

3 Comments

That makes sense to me. Thanks a lot for that insightful answer. So, essentially the execution of the block itself would cause another strong reference to the instance of the AAPLRenderer class. In my own renderer I explicitly stop the rendering loop (from which the execution of the block gets triggered) before I stop using the renderer. In that case the strong reference created by the block would automatically go away and there shouldn't be much of a danger regarding a retain cycle. Would you in general agree with this?
@ackh - I still would be wary about referencing the ivar (with the concomitant self reference) in addCompletedHandler. It’s not a question as to whether the “block gets triggered”, but rather, whether the block was released after it was triggered. Well written completion handler code will generally release the block when it’s done with it, but are you 100% sure that’s the case here? As a rule, I’d suggest writing code that makes strong reference cycles impossible (i.e. just use local var here), rather than hoping and/or relying on other code to not allow the cycle.
Good answer, but you go on to suggest it “... could be a __weak reference to self just as easily as a __block-qualified reference to an ivar) to be explicit and safe.” Often that’s the case, but not here. A weak self reference could end up being nil and it would be incorrect to call dispatch_semaphore_wait/signal with a nil value for the semaphore. In this case, local var (with or without __block qualifier) is correct.
2

I think warrenm correctly answered your question as to why one would use a local variable rather than the ivar (and its implicit reference to self). +1

But you asked about why a local variable would marked as __block in this case. The author could have done that to make his/her intent explicit (e.g., to indicate the variable will outlive the scope of the method). Or they could have potentially done it for the sake of efficiency (e.g., why make a new copy of the pointer?).

For example, consider:

- (void)foo {
    dispatch_semaphore_t semaphore = dispatch_semaphore_create(4);
    NSLog(@"foo: %p %@", &semaphore, semaphore);

    for (int i = 0; i < 10; i++) {
        dispatch_async(dispatch_get_global_queue(QOS_CLASS_DEFAULT, 0), ^{
            dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
            NSLog(@"bar: %p %@", &semaphore, semaphore);
            [NSThread sleepForTimeInterval:1];
            dispatch_semaphore_signal(semaphore);
        });
    }
}

While these are all using the same semaphore, in the absence of __block, each dispatched block will get its own pointer to that semaphore.

However, if you add __block to the declaration of that local semaphore variable, and each dispatched block will use the same pointer which will be sitting on the heap (i.e. the address of &semaphore will be the same for each block).

It’s not meaningful here, IMHO, where there’s only one block being dispatched, but hopefully this illustrates the impact of __block qualifier on local var. (Obviously, the traditional use of __block qualifier is if you’re changing the value of the object in question, but that’s not relevant here.)

... marking instance variables with __block can't be done [in gcc] ... but why would Clang allow this if it shouldn't be done?

Regarding why no error on __block for ivars, as that referenced answer said, it’s basically redundant. I’m not sure that just because something is redundant that it should be disallowed.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.