2

I have code that has worked well when compiled with gcc but fails to work correctly when compiled with clang. I tracked down the issue to a read of volatile memory (on a microcontroller). I found by stepping in assembly that the memory associated with pma_address[i] is read into a register but the values set in the register are never used. So the handle_setup_packet is never called presumably because the optimizer determined that buffer will never change. By making buffer_out into a volatile uint8_t* the code does function correctly. But I don't see why it would need to be volatile in this case. I would think one volatile would be enough to convince the compiler that buffer is worth looking at. I'm wondering what my logic error is in the following code:

void read_pma(uint8_t byte_count, volatile uint16_t * pma_address, uint8_t *buffer_out) {
    int count_received_16 = (byte_count + 1) >> 1;
    for(int i=0; i<count_received_16; i++) {
        ((uint16_t *) buffer_out)[i] = pma_address[i];
    }
}

void fun() {
    uint8_t buffer[64];
    read_pma(byte_count, USBPMA->buffer[0].EP_RX, buffer);
    handle_setup_packet(reinterpret_cast<usb_control_request *>(buffer));
}

Updates

I'm adding additional information to address the comments. From the suggestions in the comments I have learned more about strict aliasing and have found what I think is the main problem. The compiler is clang 20.1.0 provided by ARM: https://github.com/arm/arm-toolchain with the options -target=armv7m-none-eabi -O3 -flto. The assembly generated is quite large due to the loop unrolling it is doing, but generally there a number of sections like below, where it is loading multiple times out the pma_address memory location and writing repeatedly to a scratch register.

;         ((uint16_t *) buffer_out)[i] = pma_address[i];
10000776: f835 3c1e     ldrh    r3, [r5, #-30]
1000077a: f835 3c1c     ldrh    r3, [r5, #-28]
1000077e: f835 3c1a     ldrh    r3, [r5, #-26]
10000782: f835 3c18     ldrh    r3, [r5, #-24]
10000786: f835 3c16     ldrh    r3, [r5, #-22]
1000078a: f835 3c14     ldrh    r3, [r5, #-20]

I found several several ways that I can get it to generate the intended assembly below:

;         ((uint16_t *) buffer_out)[i] = pma_address[i];
 8004370: 885f          ldrh    r7, [r3, #0x2]
 8004372: eb00 0144     add.w   r1, r0, r4, lsl #1
 8004376: f8a1 704d     strh.w  r7, [r1, #0x4d]
 800437a: eb08 0744     add.w   r7, r8, r4, lsl #1
 800437e: f8b7 20a2     ldrh.w  r2, [r7, #0xa2]
;     for(int i=0; i<count_received_16; i++) {
 8004382: 3404          adds    r4, #0x4
;         ((uint16_t *) buffer_out)[i] = pma_address[i];
 8004384: f8a1 204f     strh.w  r2, [r1, #0x4f]
 8004388: f8b7 20a4     ldrh.w  r2, [r7, #0xa4]

Where the assembly is both loading and storing the information obtained from pma_address. The solutions I found (without adding volatile) that result in a correctly operating program are 1) compile without -flto, 2) compile with -fno-strict-aliasing, and 3) change the code to address the aliasing violations.

void read_pma(uint8_t byte_count, volatile uint16_t * pma_address, uint8_t *buffer_out) {
    int count_received_16 = (byte_count + 1) >> 1;
    for(int i=0; i<count_received_16; i++) {
        uint16_t word = pma_address[i];
        buffer_out[2*i] = word & 0xFF;
        buffer_out[2*i+1] = (word >> 8) & 0xFF;
    }
}

fun() {
    read_pma(byte_count, USBPMA->buffer[0].EP_RX, buffer);
    usb_control_request ctrl_req {};
    std::memcpy(&ctrl_req, buffer, sizeof(usb_control_request));
    handle_setup_packet(&ctrl_req);
}

I tried but could not create a version of this code in Godbolt Compiler Explorer and could only obtain the unintended behavior in the context of the full program. The main thing that surprised me in the debugging is that with LTO on the compiler can fully ignore the handle_setup_packet function due to the aliasing violation. I tested this by placing invalid code in that function asm(" invalid_op;"); in that code which generated no errors when compiled in LTO mode.

9
  • 7
    Be very cautious about tagging a question with both the c and c++ tags. Your code is clearly C++ because of the reinterpret_cast<…>, so that's the appropriate tag to use. Commented Oct 2 at 0:33
  • 6
    "Assuming clang treats uint8_t as an alias for unsigned char" -- the write is through a uint16_t lvalue, which cannot alias a usb_control_request... Commented Oct 2 at 3:02
  • 4
    this is double strict-aliasing breach due uint16_t, depending how compiler trat cstdint types. You may accidenty causing UB by going out of bounds also, some combination of compiler\ISA for some reason have agressive alterations of code (e.g. make loop infinite or throw it out, can see that only in assembly). I would also make whatever serves as buffer volatile to prevent optimization in fun() body. Commented Oct 2 at 6:58
  • 1
    I'm trying to reproduce this with godbolt but have no luck. What's the exact compiler, platform and compiler settings? Commented Oct 2 at 8:52
  • 1
    Please show the assembly language for your function. The compiler may have generated correct assembly code. Commented Oct 2 at 17:19

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.