Continuous memory reads C compilation optimization questions

Question

I was thinking about the two register timer interview question that goes as follows:

There is a hardware memory mapped timer with the value of the timer stored in two registers: one holds the most significant 32 bits, the other holds the least significant 32 bits. They are at 0x1004 and 0x1000 respectively. Read the timer.

The main idea (as far as I've seen) is to show consideration for the fact that the timer can overflow while you read so you have to make sure the upper byte hasn't changed while you read the lower byte.

One of the other things that I've been told to look out for is declaring these types of variables as volatile otherwise the compiler can optimize things away and the result will not be what I expect.

I wanted to see how the compiler would optimize this, so I wrote the following code:

int main() {
    uint32_t *ap = 0x1000;
    uint32_t *bp = 0x1004;

    uint64_t temp = ((uint64_t)(*bp) << 32) | (*ap);
    if (temp > 0x2000) {
        return 0;
    }

    return 1;

}

I expected (from vague compiler lore I've heard) the compiler to optimize this into a single 64 bit read. But no matter the optimization level I use with gcc, I can't get it to happen. I've also tried using 16 bit "registers", but the compiler will still do two separate reads.

My questions are:

Is what I've been told/what I've gathered wrong?
Will the compiler ever combine these separate reads into one?
If so, how can I get it to do it?
Bonus: I've also heard of possible bus faults in this scenario without volatile ... 1. Can that actually happen? 2. If it can, would that be because a bus can only do say 32 bit reads and the compiler may ask for a 64 bit read on that bus which it can't provide?

Compilers don't usually do much analysis on explicitly specified addresses, so it probably doesn't notice that *ap and *bp are adjacent. If you make them adjacent in a way that's built into the language, such as making them a struct, it can happen. (I also tried making them adjacent entries of an array, but that didn't work for gcc nor clang.) — Nate Eldredge
– Nate Eldredge, Commented Oct 10, 2024 at 4:07
A more subtle concern is that the compiler may do 32-bit reads of *ap and *bp, but in the wrong order. — Nate Eldredge
– Nate Eldredge, Commented Oct 10, 2024 at 4:08
Generally, if the hardware defines a memory-mapped register as being of a certain size (e.g. 32 bits), you are expected to access it with load/store instructions of that size. What happens if you use a different size is hardware dependent. I suppose a bus fault is one possibility, but certainly correct operation isn't guaranteed. — Nate Eldredge
– Nate Eldredge, Commented Oct 10, 2024 at 4:26
See also stackoverflow.com/a/71867102/634919 for some related examples, where accessing a (non-volatile, non-atomic) variable of a given size may result in narrower accesses, possibly unaligned. That certainly won't make your hardware happy. Or another where a single read of the C variable may load from memory multiple times (aka "invented loads"). — Nate Eldredge
– Nate Eldredge, Commented Oct 10, 2024 at 4:29

Lundin · Accepted Answer · 2024-10-10 07:31:59Z

For an interview, one of the most important qualities of an engineer is the ability to question the specification. In your case there's no mentioning of how wide a data word the CPU can read in a single instruction, so it is a senseless question, which needs to be answered with a counter question.

Now as it turns out, I don't think any MCU vendor yet has managed to create a timer peripheral with a wider timer register than what the CPU can manage, without giving guarantees about data integrity. It is common for example that 8 bit MCUs have 16 bit timers but then there is always hardware support guaranteeing that the value won't run off into the woods while reading it one byte at a time. Same thing with ADCs etc. So you would just read away, one byte at a time.

uint32_t *ap = 0x1000; is invalid C and will not compile without diagnostic messages. You always need to cast. And since it is not valid C, there's no telling what a "not C compiler" will do from there.

As for why it wouldn't optimize it into a 64 bit read - first of all the compiler probably does not analyze the addresses to see if they are adjacent. Because as you noted, code reading from an absolute address ought to always be volatile qualified, after which the compiler isn't allowed to optimize it for that reason.

But there's also the C type system regarding "effective type", which means that it is a bug to do "type punning" and read with a different type than the one held by the pointed-at object. For example if you place your two uint32_t in a struct then the compiler knows they are allocated adjacently but still won't optimize the read. Because doing so would probably change the meaning of the code, making it an invalid optimization.

Data stored in hardware registers etc and accessed through absolute addresses are usually to be regarded as objects with no effective type however (or otherwise there's no making sense of it). Meaning that the compiler would keep track of what type is stored there by checking what type that was first used for write access. Now if we have written code to write to that memory using uint32_t then that becomes the effective type. Even if there is another write to adjacent memory following, the compiler still shouldn't merge them into a single 64 bit write because that would change the meaning of the code.

Is what I've been told/what I've gathered wrong?

Pretty much, apart from the need to make the pointers volatile which is correct.

Will the compiler ever combine these separate reads into one?

No, explained above.

If so, how can I get it to do it?

You'd have to write explicit code for it such as this:

typedef union
{
  struct
  {
    uint32_t a;
    uint32_t b;  
  };
  uint64_t ab;
} u64_t;

volatile u64_t* u64 = (volatile u64_t*)0x1000;
uint64_t temp = u64->ab;

Bonus: I've also heard of possible bus faults in this scenario without volatile ... 1. Can that actually happen? 2. If it can, would that be because a bus can only do say 32 bit reads and the compiler may ask for a 64 bit read on that bus which it can't provide?

I don't quite see how that can happen, but we'd have to discuss the behavior of a specific ISA to tell such things. There's misaligned access or trying to access memory which you have no access too, but those are different errors.

Shelton Liu · Accepted Answer · 2024-10-13 07:50:44Z

-2

Is what I've been told/what I've gathered wrong?

In your scenario, it involves reading a 64-bit hardware timer stored in two separate 32-bit memory-mapped registers. It's crucial that using volatile keyword to ensure the compiler does not optimize away these memory accesses.

int main() {
    volatile uint32_t *ap = 0x1000;
    volatile uint32_t *bp = 0x1004;

    uint64_t temp = ((uint64_t)(*bp) << 32) | (*ap);
    
    if (temp > 0x2000) {
        return 0;
    }

    return 1;
}

answered Oct 13, 2024 at 7:50

Shelton Liu

8848 silver badges18 bronze badges

Collectives™ on Stack Overflow

Continuous memory reads C compilation optimization questions

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related