Apple offers a CPU and GPU Synchronization sample project that shows how to synchronize access to shared resources between CPU and GPU. To do so, it uses a semaphore which is stored in an instance variable:
@implementation AAPLRenderer
{
dispatch_semaphore_t _inFlightSemaphore;
// other ivars
}
This semaphore is then defined in another method:
- (nonnull instancetype)initWithMetalKitView:(nonnull MTKView *)mtkView
{
self = [super init];
if(self)
{
_device = mtkView.device;
_inFlightSemaphore = dispatch_semaphore_create(MaxBuffersInFlight);
// further initializations
}
return self;
}
MaxBuffersInFlight is defined as follows:
// The max number of command buffers in flight
static const NSUInteger MaxBuffersInFlight = 3;
Finally, the semaphore is utilized as follows:
/// Called whenever the view needs to render
- (void)drawInMTKView:(nonnull MTKView *)view
{
// Wait to ensure only MaxBuffersInFlight number of frames are getting processed
// by any stage in the Metal pipeline (App, Metal, Drivers, GPU, etc)
dispatch_semaphore_wait(_inFlightSemaphore, DISPATCH_TIME_FOREVER);
// Iterate through our Metal buffers, and cycle back to the first when we've written to MaxBuffersInFlight
_currentBuffer = (_currentBuffer + 1) % MaxBuffersInFlight;
// Update data in our buffers
[self updateState];
// Create a new command buffer for each render pass to the current drawable
id<MTLCommandBuffer> commandBuffer = [_commandQueue commandBuffer];
commandBuffer.label = @"MyCommand";
// Add completion hander which signals _inFlightSemaphore when Metal and the GPU has fully
// finished processing the commands we're encoding this frame. This indicates when the
// dynamic buffers filled with our vertices, that we're writing to this frame, will no longer
// be needed by Metal and the GPU, meaning we can overwrite the buffer contents without
// corrupting the rendering.
__block dispatch_semaphore_t block_sema = _inFlightSemaphore;
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
{
dispatch_semaphore_signal(block_sema);
}];
// rest of the method
}
What I fail to understand here is the necessity of the line
__block dispatch_semaphore_t block_sema = _inFlightSemaphore;
Why do I have to copy the instance variable into a local variable and mark this local variable with __block. If I just drop that local variable and instead write
[commandBuffer addCompletedHandler:^(id<MTLCommandBuffer> buffer)
{
dispatch_semaphore_signal(_inFlightSemaphore);
}];
It seems to work as well. I also tried to mark the instance variable with __block as follows:
__block dispatch_semaphore_t _bufferAccessSemaphore;
This compiles with Clang and seems to work as well. But because this is about preventing race conditions I want to be sure that it works.
So the question is why does Apple create that local semaphore copy marked with __block? Is it really necessary or does the approach with directly accessing the instance variable work just as well?
As a side note, the answer to this SO question remarks that marking instance variables with __block can't be done. The answer is according to gcc but why would Clang allow this if it shouldn't be done?