1

I am working on the RTOS support in our SW components.

There are some scenarios when it makes sense to create a copy of global variables into local variables in a critical section (e.g. protected by mutex) and then use the local copy further in time consuming operations outside of critical section.

I am afraid that C compiler might optimize the local variable assignments and use the global variable directly which would undermine the efforts of eliminating the race conditions from the code.

I wrote an oversimplified LCD display example code to illustrate the problem.

I have the following questions:

  • How could it be guaranteed that the local variables won't be optimized?
  • How could it be guaranteed that the order of lock-unlock and copy happens as intended?
  • Would volatile type qualifier help in this case (local variables)?
uint8_t page_sel = 0;
char lcd_text[PAGE_CNT][ROW_CNT][COLUMN_CNT];

void lcd_print_text(uint8_t page, uint8_t row, const char* text)
{
  lock();
  // Store text in global variable which represents
  // the text on the display
  copy_text_to_lcd_text(page, row, text);
  unlock();

  // Display update request to run the lcd background task
  refresh_semaphore_set();
}

void lcd_select_page(uint8_t page)
{
  lock();
  // Store the selected page
  page_sel = page;
  unlock();

  // Display update request to run the lcd background task
  // If the selected page changes then lcd shall be updated
  refresh_semaphore_set();
}

void lcd_task(void)
{
  while(1) {
    // Update the display only if there are modifications
    refresh_semaphore_get();
    refresh();
  }
}

void refresh(void)
{ 
  char page_lcd_text[ROW_CNT][COLUMN_CNT]
  uint8_t page;

  lock();
  // Page number and text shall be consistent
  // so critical section is necessary
  page = page_sel;
  // Copy is created to avoid partial overwrites during
  // display update
  copy_page(page_lcd_text, lcd_text, page);
  // It is essential to have a local copy before
  // the critical section is left
  unlock();
  
  // Create pixel data in frame buffer from text (slow)
  render_text(page_lcd_text);

  // Create pixel data in frame buffer to display (slow)
  // selected page number on display
  render_page_num(page);

  // Transfer pixel data to LCD driver chip (even slower)
  lcd_spi_transfer();
}
8
  • Are you worried about the following flow: lock(); local = global; unlock(); <use local> ? Commented Sep 12, 2024 at 15:44
  • Your lock() and unlock() should theoretically have memory barriers, that will ensure the global value is read between them and not elsewhere, so even if local is somehow optimized, the access to global is still in the locked section. Commented Sep 12, 2024 at 16:03
  • @EugeneSh. Yes, exactly. I am worried that the compiler will decide that page = page_sel; assignment to a local variable is completely unnecessary because as the compiler could see the world it could just read the page_sel at the render_page_num function call. What do you mean exactly when you say memory barrier? The barriers I am familiar with are used to make sure pipeline is cleared before next instruction is executed so it makes sure an assignment is done at HW level. I suppose it can have effect on optimization and expression reordering as well. Could you provide an example for GCC pls? Commented Sep 13, 2024 at 5:01
  • 1
    The basic memory barrier is achieved by a dummy assembly instruction asm("":::"memory"). It is telling the compiler that this instruction is clobbering memory (which is a lie, but the compiler must believe it), so it must ensure all of the memory operations that were performed before it must complete at this point. It serves two purposes - prevents the reordering of instructions before and after the barrier. So if you make it a part of your lock/unlock functionality it will make sure that the global is read between the two, and not elsewhere. Commented Sep 13, 2024 at 13:49
  • 1
    There are other kinds of barriers too, which are hardware specific, but here we don't care much about these as we are concerned about the compiler behavior only. Commented Sep 13, 2024 at 13:50

1 Answer 1

4

How could it be guaranteed that the local variables won't be optimized?

The only way to ensure that is to declare them as volatile. Which likely also means they'll get stack allocated rather than register allocated.

Related to that, you will most likely not want to allocate char page_lcd_text[ROW_CNT][COLUMN_CNT] on the stack, since it will lead to a dangerous stack peak usage whenever the function is called. I recommend that it should be declared static so it gets allocated in .data, or if that's not possible because you fear race conditions, then allocate it on the caller side and pass along a pointer to the buffer from main() to the LCD driver during initialization.

In embedded programming, a rule of thumb of good design is actually to never copy buffers at any point, since it is slow. Doing so inside locks is particularly bad design, since you stall the whole system. The normal way to design embedded systems is instead to pre-allocate n number of buffers statically (depending on use and if you need double/triple buffering etc) and then only swap pointers pointing at the buffers, rather than doing a hardcopy. For example so that the background program works with one buffer while some SPI/DMA driver works with another buffer. And when they are done you just swap the pointers.

How could it be guaranteed that the order of lock-unlock and copy happens as intended?

That's up to the compiler and in particular the RTOS lib implementation. In general, compilers should not instruction re-order code which boils down to inline assembler. So in case your locks are some manner of function-like macros boiling down to inline assembler, then that will prevent re-ordering. If that's not how the macros are implemented, well then you'll have to ask who designed the RTOS how it is supposed to work.

On the C level then (arguably) volatile accesses are not allowed to be re-ordered in respect to anything else in the code. However, some compilers are more liberal and just ensure that volatile accesses are not re-ordered in relation to each other, which is arguably non-conforming to the C standard. Either way, there is nothing in C itself except volatile which might act as a "memory barrier". Atomic access for example won't do a difference when it comes to re-ordering.

Would volatile type qualifier help in this case (local variables)?

Yes. And not just only for the local variables but also the file scope ones. Otherwise the optimizer might make strange assumptions regarding if a file scope variable has been updated since the last time or not. This is a bigger issue than potential re-ordering.

Sign up to request clarification or add additional context in comments.

1 Comment

And before the brainwashed flood of comments about "not using volatile for multithreading" start pouring in: you are wrong. While volatile does not do jack to prevent race conditions, that is not even the issue being discussed here. Volatile definitely helps against incorrect optimizations and it may or may not help against instruction re-ordering. Race conditions, miscompiled code due to optimizations and instruction re-ordering are 3 SEPARATE ISSUES NOT RELATED TO EACH OTHER.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.