10

The order of bits in C11 bitfields is implementation-defined, and so is any padding in between them. This makes it difficult to access the bitfields by index at runtime, even though most implementations order the bitfields in a straightforward order.

This code attempts to avoid any undefined behavior by using a constant array containing the bit positions, and by using memcpy to access individual bytes of the bitfield:

// Generic type for up to 8 bitfields
typedef struct {
    bool b0: 1;
    bool b1: 1;
    bool b2: 1;
    bool b3: 1;
    bool b4: 1;
    bool b5: 1;
    bool b6: 1;
    bool b7: 1;
} bitfield8_t;

// Order of bits in a bitfield is implementation-defined
// This constant array makes the code portable
static const bitfield8_t bitorder[8] =
{
    {1,0,0,0,0,0,0,0},
    {0,1,0,0,0,0,0,0},
    {0,0,1,0,0,0,0,0},
    {0,0,0,1,0,0,0,0},
    {0,0,0,0,1,0,0,0},
    {0,0,0,0,0,1,0,0},
    {0,0,0,0,0,0,1,0},
    {0,0,0,0,0,0,0,1},
};

// Set bit in bitfield by index
// bitpos must be in range 0..7
void set_bit(bitfield8_t *x, int bitpos)
{
    // Reserve temporary storage for individual bytes
    // Normally it's just one byte, but compiler is
    // allowed to add padding.
    uint8_t x_bytes[sizeof(bitfield8_t)];
    uint8_t s_bytes[sizeof(bitfield8_t)];
    
    // Get the bit that corresponds to the index
    bitfield8_t s = bitorder[bitpos];
    
    // Copy fields to bytes so that we can modify
    memcpy(x_bytes, x, sizeof(bitfield8_t));
    memcpy(s_bytes, &s, sizeof(bitfield8_t));
    
    // Bitwise-OR each byte (typically just one)
    for (int i = 0; i < sizeof(bitfield8_t); i++)
    {
        x_bytes[i] |= s_bytes[i];
    }
    
    // Copy result back to bitfield
    memcpy(x, x_bytes, sizeof(bitfield8_t));
}

Then in the actual usage, the bitfield bits would have different names and possibly not all 8 are included:

// Actual code would have different names for
// the bitfield bits in a larger structure.
typedef struct {
    int some_field;
    uint16_t some_other_field;

    union {
        bitfield8_t bitfield_raw;
        struct {
            bool fancy_flag: 1;
            bool silly_flag: 1;
        };
    };

    uint8_t more_fields;
} myfancystruct_t;

int main()
{
    myfancystruct_t mfs = {};

    set_bit(&mfs.bitfield_raw, 1);

    printf("fancy_flag: %d\n", mfs.fancy_flag); // Prints 0
    printf("silly_flag: %d\n", mfs.silly_flag); // Prints 1

    return 0;
}

This optimizes nicely enough on most architectures: (View on Compiler Explorer)

set_bit:
        movsx   rsi, esi
        mov     al, BYTE PTR bitorder[rsi]
        or      BYTE PTR [rdi], al
        ret

But is this well-defined behavior by C11 standard?

My interpretation is that:

  1. C11 section 6.5.2.3 clause 6 allows "common initial sequence" in union, including bit-fields, which makes the bitfield_raw access ok.
  2. Section 6.2.6.1 clause 3 says that "unsigned bit-fields .. shall be represented using a pure binary notation", which makes the bitwise-or work for setting the bitfield. It might also guarantee that sìzeof(bitfield8_t) == 1, but that is not so important.
  3. Section 6.2.6.1 clause 4 allows memcpy from struct to bytes and back. It excludes bit-fields, but I think structs containing bit-fields is fine.

I am aware that there are many ways to implement bit flags in C, and I'm still just considering my options. To make this question answerable, I would like to focus it on whether the shown approach is well-defined. Any answer that points out the part of standard that makes this undefined behavior is greatly appreciated!

17
  • 2
    Strictly speaking any struct/union may including padding bytes anywhere except in the beginning, so they are never fully portable and always contain impl.defined behavior, bit-field or no bit-field. Commented Nov 18 at 9:16
  • 1
    @Lundin What part of this code would break from padding bytes? Commented Nov 18 at 9:17
  • 2
    You asked if it has well-defined behavior. If there is anything implementation-defined present, then it is not well-defined. If the code would break or not is another matter. Commented Nov 18 at 9:22
  • 1
    @Lundin Hmm, by "well-defined" I meant if the observable end result of the code is the same (the bit-field member gets set). I think that will happen correctly even if there are padding bytes. I agree that the generated code of set_bit will be different if sizeof(bitfield8_t) is larger due to padding bytes. Commented Nov 18 at 9:24
  • 2
    Perhaps I'm missing something, but if you can only set bits by index, I'm not sure what the point is of giving them fancy member names. Commented Nov 18 at 10:10

1 Answer 1

11

The order of bits in C11 bitfields is implementation-defined, and so is any padding in between them.

There are specified, unspecified, and implementation-defined aspects to that, but it's more unspecified than anything else. Still, I guess your point is that different implementations behave differently, which is true.

This makes it difficult to access the bitfields by index at runtime, even though most implementations order the bitfields in a straightforward order.

All implementations order bitfields in a straightforward manner. But that's not very relevant, because doing anything that depends on the storage order of bitfields is bad news. In fact, most other uses of bitfields are bad news, too. Generally speaking, people who want to set up a bit array with elements accessed by index don't use bitfields to do it. They use a flat integer, with bits read or written via shifting and masking:

void set_bit(uint8_t *x, unsigned bitpos) {
    *x = *x | (1u << bitpos);
}

void clear_bit(uint8_t *x, unsigned bitpos) {
    *x = *x & ~(1u << bitpos);
}

unsigned get_bit(uint8_t x, unsigned bitpos) {
    return (x >> bitpos) & 1;
}

But is this well-defined behavior by C11 standard?

My interpretation is that:

  • C11 section 6.5.2.3 clause 6 allows "common initial sequence" in union, including bit-fields, which makes the bitfield_raw access ok.

As shown in your example, yes. The common initial sequence of two or more structures can include or consist entirely of bitfields. They can then be accessed via the union.

  • Section 6.2.6.1 clause 3 says that "unsigned bit-fields .. shall be represented using a pure binary notation", which makes the bitwise-or work for setting the bitfield. [...]

I don't think you need to rely on that for bitfields of width 1, but OK.

  • Section 6.2.6.1 clause 4 allows memcpy from struct to bytes and back. It excludes bit-fields, but I think structs containing bit-fields is fine.

6.2.6.1/4 is not an enabling provision for that. It is about defining the term object representation. Any object that is not declared with register storage class can be copied via memcpy, subject to the specifications of that function. Bitfields are not objects, but structures and unions containing them are, so yes, you can use memcpy to copy those.

So yes, the code presented has well defined behavior.

But I urge you not to do that. It is much larger and more complicated than the typical alternative. It goes out of its way to use bitfields at all, which is exactly the wrong direction. The best solution to bitfield details being unspecified is to avoid using bitfields. Especially so when to do otherwise requires building a complex framework and imposing additional structure on top of them.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for confirming my interpretation of the standard. I'll also take your (and the ones from comments) concerns into account when choosing between approaches.
In terms of performance, note that your original did not optimize away static const bitfield8_t bitorder[8]. It loads from it, using the bit-index as a byte index, to get a mask. This means the compiler doesn't understand what's going on, so it's not going to auto-vectorize it with SIMD for example, or do other optimizations with other surrounding code. Not to mention the inefficiency of actually doing these extra loads instead of e.g. bts eax, esi to set a bit in a register using another as an index.
Also, this array of bitfield8_t doesn't give you an efficient way to do things like search for the next set bit or to popcount, only access single bits. Also, even with uint8_t, C (and C++) have the annoying problem that for some uses you want to access bitfield data with the widest integer type that's fast (often size_t or uintptr_t), but for other cases you want 32-bit or 8-bit (like to make separate nearby set_bit ops independent of each other, instead of sequential store/reload of the same uint64_t. C and C++ make this inconvenient because of the strict-aliasing rule.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.