How to efficiently initialize a volatile struct

Question

I would like use bit-fields to access low level memory. I'm aware of the non-portability of bitfields but they do appear to be consistently implemented on my platform (cortex-m4) on both clang and gcc. However, I found that assignment of the entire bitfield at once generates significantly more assembly instructions on clang than gcc. A simplified example of my goal is shown below:

struct B {
    struct {
        volatile int b:3;
        volatile int c:3;
        volatile int d:26;
    } a;
} bmem;

constexpr B * const b = &bmem;

void fun() {
    b->a = { .b = 1, .c = 2 };
}

compiler explore link

When switching between armv7-a clang 21.1.0 and arm gcc 15.2.0, one can see that the gcc compiler optimizes the struct initialization to 3 instructions, while clang takes 16. It appears that clang treats the rhs values as volatile while gcc does not. I tried various permutations to reduce the instructions that clang generates, but I could not find a reliable way to make it work. The closest that I could do without a lot of extra code is marking the struct pointer b as pointer to volatile rather than the members of struct B. And then making it a union and using illegal (due to undefined behavior) union type punning. However I don't like this solution. Is there a straightforward way to indicate to the compiler that I would like to set this memory value without treating the initialization value as volatile?

Here is the union type punning version:

struct B {
    union {
        int u;
        struct {
            int b:3;
            int c:3;
            int d:26;
        } a;
    };
} bmem;

constexpr volatile B * const b = &bmem;

void fun() {
    b->u = B { .a = { .b = 1, .c = 2 } }.u;
}

Additional information

Thank you for the comments below. For additional reference I'm looking to access to memory mapped hardware registers. I should have made it clear that I'm not trying to do thread communication although I can imagine that the solutions for thread communication could provide insight into how to do the memory mapped registers. Generally the status quo for this platform is use manufacture provided header files. These header files can take two different forms but both are based on volatile. The first form is volatile uint32_t registers plus a series of #define bit masks. The second form is a bit-field union like code example #2 above, except that the union and struct members are defined volatile. I've been using the first form extensively and I am experimenting with the second form to help reduce bit logic mistakes and have better type and range checking.

union type punning

Per the comments I looked into union type punning and found that both clang and gcc currently admit that it works with certain caveats. Per their caveats I'm a little concerned that I would make a mistake but using pointers incorrectly. I also prefer the type safety that code example #1 gives.

atomic builtin functions

I now tried these and found that void __atomic_store (type *ptr, type *val, int memorder) with memorder __ATOMIC_RELAXED gives me a working system. However I found it also doesn't seem to have type safety, allowing me to set any val to any ptr type for some reason. It also lacks the ability to use set the bitfield directly such as b->a.c = 3 without using an atomic_load and atomic_store. Where the volatile version does behave as I would like in that case. I also found that if I compile with -flto that I would have to mark the b instance with __attribute__((used)).

Profiling

On this platform I'm very aware of the cost of register setting due to keeping track of registers set during interrupts. In practice these bit-fields may be more than three values and the number of instructions scales with the number of bit-field members on clang. On this platform the pipeline is relatively straightforward such that the number of cycles is generally pretty proportional to the number of instructions in this case.

Answer 1 · 2025-11-10 21:24:40Z

Pete Becker

• Nov 10 at 21:24

Re: “illegal (due to undefined behavior)” — non sequitur. “Undefined behavior” means only that the C++ language definition doesn’t tell you what the program does. If your compiler documents what it does it’s a legal compiler extension; there’s nothing formally wrong with the code.

Answer 2 · 2025-11-10 21:24:53Z

Once upon a time, probably up until about 1998, volatile meant something that the ordinary Mk1 programmer brain could reason about.

That time has passed. Now you should probably be using "atomic intrinsics" in one or another form.

If you are initializing a struct on startup (but not at any other time, it is probably safe to set the struct to zeros using memset but essentially if you want to do anything else, you need to offer the appropriate supplications to the compiler authors in the form of explicit atomic operations.

For example __atomic_compare_exchange or __atomic_store may do what you want.

Answer 3 · 2025-11-10 21:41:34Z

Jesper Juhl

• Nov 10 at 21:41

Side note: fewer instructions does not inherently mean "faster".

Answer 4 · 2025-11-10 21:49:47Z

Christian Stieber

• Nov 10 at 21:49

the gcc compiler optimizes the struct initialization to 3 instructions, while clang takes 16.

And this causes a performance bottleneck in your application?

Answer 5 · 2025-11-11 05:39:47Z

You say "I would like to use bitfields to access low level memory".. Ehr no you don't... Bitfields are implementation defined and do NOT address individual bits. At best they behave like integers with N number of bits and in that respect they are kind of useless. When you need bitwise precision you will have to use bitmasking, shift operators etc. And then assign the result to a volatile variable. Volatile is useful if you have memory mapped IO and you don't want the compiler to optimize read/writes away, don't use it to make your code "thread-safe" because that's not what volatile is for.

So maybe you should explain better "WHAT" you want to do, because at this point asking how to optimize incorrect code does not make sense.

Answer 6 · 2025-11-11 21:04:58Z

For your use case, setting a memory-mapped hardware register, you do want volatile, not atomic. Your follow-up idea, to have a volatile B*, and for B to have non-volatile members, is probably best (unless volatile members do some kind of compiler magic that your hardware needs). This should generate the new value to update to, with optimizations enabled, then update the hardware register to that value using volatile semantics.

An alternative is to use std::bit_cast<uint32_t>, which has unspecified, not undefined, behavior. The bitfield has unspecified behavior already. You might also be able to define the possible field values as constexpr constants that you can combine with bitwise or.

The GCC documentation warns:

As bit-fields are not individually addressable, volatile bit-fields may be implicitly read when written to, or when adjacent bit-fields are accessed. Bit-field operations may be optimized such that adjacent bit-fields are only partially accessed, if they straddle a storage unit boundary. For these reasons it is unwise to use volatile bit-fields to access hardware.

Answer 7 · 2025-11-12 20:53:45Z

Long ago, in my previous job position, I wrote a huge structure very similar to your second snippet using union type punning. The structure was mapping the whole set of hardware registers of our custom made ASIC, designed to be embedded in a printer. Then the whole structure was marked as volatile.

It worked quite well during many years. But it was not optimum and now, with more experience in C++, I would recommend against this type of implementation.

What I would implement nowadays is a template class with more fine grained control of read or write accesses. I wrote for you a prototype Register of such a template class with I guess a reasonable syntax. In that implementation, the programmer needs to expressly call read() or write() to generate a bus access. Otherwise, the overloaded operators handle a mirror of the hardware value. Using C++20 std::bit_cast<>, I am avoiding union type punning completely. The main function is just an example of how to use the class.

#include <stdint.h>
#include <bit>

template<class B, class U=uint32_t> class Register
{
public:
    static_assert(sizeof(B) == sizeof(U));
    Register(U* address) : ptr(address) { clear(); }
    operator U() const { return std::bit_cast<U>(b); }
    operator B() const { return b; }
    Register& operator=(U value) { b = std::bit_cast<B>(value); return *this; }
    Register& operator=(B value) { b = value; return *this; }
    Register& read() { return operator=(*ptr); }
    void write() const { *ptr = operator U(); }
    void write(U value) { operator=(value); write(); }
    void write(B value) { operator=(value); write(); }
    void clear() { b = std::bit_cast<B>(U(0)); }

    B b;
private:
    volatile U* ptr;
};

int main()
{
    struct BitField
    {
        uint32_t fieldA : 3;
        uint32_t fieldB : 10;
        uint32_t fieldC : 19;
    };
    uint32_t hardware_value;
    Register<BitField> reg(&hardware_value);
    reg.b.fieldA = 2;
    reg.b.fieldC = 23;
    reg.write();

    reg.write(0x12345);
    reg.write({ .fieldA = 1, .fieldC = 20 });

    uint32_t v = reg.read().b.fieldA;
    v = reg;
    reg = v >> 2;
    reg.write();
    (reg = { .fieldC = 100 }).write();
}

Collectives™ on Stack Overflow

How to efficiently initialize a volatile struct

Additional information

union type punning

atomic builtin functions

Profiling

7 Replies 7

Your Reply

Collectives™ on Stack Overflow

How to efficiently initialize a volatile struct

Additional information

union type punning

atomic builtin functions

Profiling

7 Replies 7

Your Reply

Sign up or log in

Post as a guest