STM32 same while loop code but compiled to different assembly code

Question

I'm learning RTOS on stm32F411RE board (Cortex-M4). I use MDK uVision v5. I encounter a problem of C code while loop. The code in the following is exactly the same in my project and the instructor's project (on Udemy), however, after compiling both project (on my PC), the assembly code look's different. I want to ask what makes this different. Thank you.

void osSignalWait(int32_t *semaphore)
{
    __disable_irq();
    while(*semaphore <=0)
    {       
            __disable_irq();        
            __enable_irq();
    }
    *semaphore -= 0x01;
    __enable_irq();
}

In the debug view (see image), if the condition does not match, it does not go to load the real value LDR r1,[r0, #0x00] and then do the comparison. Instead, it compares and goes to execute the command inside the while loop. My code compiled below

   100: void osSignalWait(int32_t *semaphore) 
   101: { 
0x08001566 4770      BX            lr
   102:         __disable_irq(); 
   103:         while(*semaphore <=0) 
   104:         {               
0x08001568 B672      CPSID         I
   101: { 
   102:         __disable_irq(); 
   103:         while(*semaphore <=0) 
   104:         {               
0x0800156A 6801      LDR           r1,[r0,#0x00]
0x0800156C E001      B             0x08001572
   105:                         __disable_irq();                 
0x0800156E B672      CPSID         I
   106:                         __enable_irq(); 
   107:         } 
   108:         *semaphore -= 0x01; 
0x08001570 B662      CPSIE         I
0x08001572 2900      CMP           r1,#0x00
0x08001574 DDFB      BLE           0x0800156E
0x08001576 1E49      SUBS          r1,r1,#1
   109:         __enable_irq(); 
0x08001578 6001      STR           r1,[r0,#0x00]
0x0800157A B662      CPSIE         I
   110: }

If I compile the instructor's (on Udemy) code (on my PC using his project), the assembly code look's different ( with exactly the same while loop code). It would load the real value again and do the comparison. Instructor's code compiled below (Compiled on my PC)

100: void osSignalWait(int32_t *semaphore) 
   101: { 
0x08000CDE 4770      BX            lr
   102:         __disable_irq(); 
0x08000CE0 B672      CPSID         I
   103:         while(*semaphore <=0) 
   104:         { 
0x08000CE2 E001      B             0x08000CE8
   105:                         __disable_irq();                         
0x08000CE4 B672      CPSID         I
   106:                         __enable_irq();   
   107:         } 
0x08000CE6 B662      CPSIE         I
0x08000CE8 6801      LDR           r1,[r0,#0x00]
0x08000CEA 2900      CMP           r1,#0x00
0x08000CEC DDFA      BLE           0x08000CE4
   108:         *semaphore -= 0x01; 
0x08000CEE 6801      LDR           r1,[r0,#0x00]
0x08000CF0 1E49      SUBS          r1,r1,#1
0x08000CF2 6001      STR           r1,[r0,#0x00]
   109:         __enable_irq(); 
   110:          
   111:          
0x08000CF4 B662      CPSIE         I
   112: }

Hi, because the instructor never reply to students, I have to ask the question here. — Dung-Yi
– Dung-Yi, Commented Mar 17, 2020 at 14:56
@PaulOgilvie Which compiler option do I need to look into? Thank you — Dung-Yi
– Dung-Yi, Commented Mar 17, 2020 at 15:06
@Dung-Yi in the instructor's code image you didn't show the very first line of the function. Yes this matters for us as we can't assume anything. — Jabberwocky
– Jabberwocky, Commented Mar 17, 2020 at 15:19
FWIW: I think the body of the while loop should be __enable_irq(); __disable_irq(); in that order. — Chris Hall
– Chris Hall, Commented Mar 17, 2020 at 15:20

Elijan9 · Accepted Answer · 2020-03-18 08:34:46Z

4

Since you aren't telling the compiler semaphore can change during the execution of this function, your compiler has decided to optimize your code and load the value of semaphore only once and use its copy in the while loop, then only write the result in the end. As it is written now, there's no reason for the compiler to assume this could be harmful.

To notify the compiler a variable can change outside the function, during the execution of that function, please use the volatile keyword, see: https://en.cppreference.com/w/c/language/volatile

In that case, your code would become:

void osSignalWait(volatile int32_t *semaphore)
{
    __disable_irq();
    while(*semaphore <=0)
    {       
        __disable_irq();        // Note: I think the order is wrong...
        __enable_irq();
    }
    *semaphore -= 0x01;
    __enable_irq();
}

By the way, calling __disable_irq twice (once before the while loop, then at the start inside the loop) then __enable_irq seems a bit wonky, don't you mean enable (and do something) then disable within the while loop?

edited Mar 18, 2020 at 8:34

answered Mar 17, 2020 at 15:19

Elijan9

1,3142 gold badges18 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

23 Comments

Peter Cordes Over a year ago

@Dung-Yi: the debugger doesn't know that. It only reloads because it must have been compiled without optimization, so the compiler treated everything kind of like volatile. See Why does clang produce inefficient asm with -O0 (for this simple floating point sum)?. Or specifically for that bug in your and your instructor's code: MCU programming - C++ O2 optimization breaks while loop / Multithreading program stuck in optimized mode but runs normally in -O0

Peter Cordes Over a year ago

And BTW, the portable C choice for communication between cores / threads is _Atomic int32_t * (with memory_order_relaxed). (If you're disabling IRQs, you can use a separate atomic load and atomic store, not -=). @Dung-Yi: Hmm, I think your code is broken: the while() loop body leaves interrupts enabled before doing -=. I think you want enable(); disable(); as the loop body to give interrupt handlers a chance to run while you're spin-waiting. Or better, spin with interrupts enabled until you see *sem <= 0, then try disable, check, and decrement.

phonetagger Over a year ago

@Elijan9 - The loop body currently disables interrupts and then enables them, which as Peter Cordes mentions, is backwards. The purpose of those in the loop body (when done in the correct order) is to ensure that when the semaphore becomes available, the loop body exits with interrupts disabled. In the meantime, repeatedly enabling them ensures that interrupts can be serviced during the spin-wait.

Dung-Yi Over a year ago

Thank you guys so much. I appreciate your kindness based on my simple and a little bit silly question. (I've done so much research for almost 1 week but cannot find the correct keyword for the answer. I hope the question is not duplicate)

phonetagger Over a year ago

@Dung-Yi I do not mean using a "cooperative spin-lock". I don't know what sort of RTOS you are using; home-brew perhaps? (Did your professor write it?) Any good commercial RTOS would have a "wait on semaphore" API function that puts your thread to sleep until the semaphore is made available by another thread or interrupt. While your thread is sleeping, it consumes no CPU instruction cycles at all. Using a yield or delay function is better than plain spinning, but it still takes intermittent CPU cycles; it's not as good as an API function that's specifically made for waiting on a semaphore.

|

0___________ · Accepted Answer · 2020-03-17 16:47:09Z

0

This very well known keil over optimisation bug. Reported many times. Having memory clobber it should read the memory every time.

Here is an example how clobbers work

#include <stdint.h>

unsigned x;
volatile unsigned y;


int foo()
{
    while(x < 1000);
}

int bar()
{
    while(x < 1000) asm("":::"memory");
}

foo:
        ldr     r3, .L5
        ldr     r3, [r3]
        cmp     r3, #1000
        bxcs    lr
.L3:
        b       .L3
.L5:
        .word   x
bar:
        ldr     r1, .L11
        ldr     r2, .L11+4
        ldr     r3, [r1]
        cmp     r3, r2
        bxhi    lr
.L9:
        ldr     r3, [r1]
        cmp     r3, r2
        bls     .L9
        bx      lr
.L11:
        .word   x
        .word   999

answered Mar 17, 2020 at 16:47

0___________

71.6k4 gold badges41 silver badges94 bronze badges

8 Comments

Peter Cordes Over a year ago

That's not an over-optimization bug, it's data-race UB in the C source that you can work around with a memory barrier or volatile. Or actually avoid UB with _Atomic (with mo_relaxed so it can compile to the same asm). Why do you say a barrier is better than volatile, or _Atomic int (with mo_relaxed)? Since this is for a semaphore object that's only ever used for synchronization so you never want to let the compiler keep it in a register across multiple reads. volatile ensures we don't have invented loads from one read.

Dung-Yi Over a year ago

Thank you for point out the issue. I did find this bug reported in stack overflow before but never realize that my problem is caused by this!! Thank you.

0___________ Over a year ago

@PeterCordes I do not say what is better. They are not identical and have different uses godbolt.org/z/aXSMB_

Peter Cordes Over a year ago

@Dung-Yi: Just to be clear; it's not a compiler bug, it's a bug in your code. The compiler is working as intended and optimizing variables into registers under the assumption that no other thread can modify them, unless you tell it otherwise. It's allowed to assume that for non-_Atomic and non-volatile variables because otherwise that would be data-race UB. If compilers didn't do this, any code that used global variables or even pointers would run slowly, storing after every change and reloading before every use.

Peter Cordes Over a year ago

@P__J__: I think I was getting mixed up between the comment thread on the other answer vs. this. But why post it when there's already an answer suggesting volatile? The interesting point is that the "memory" barrier that should be part of the definition of __disable_irq() should have already blocked reordering, like it does on GCC. godbolt.org/z/fbpe2d. (But yes I'm aware that memory barriers don't affect non-escaped local vars, like a function arg.)

|

Collectives™ on Stack Overflow

STM32 same while loop code but compiled to different assembly code

2 Answers 2

23 Comments

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

23 Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related