2

Consider the following structure:

struct foo
{
  uint32_t a;
  uint16_t b;
};

Assume that this structure has 2 padding bytes at the end (a likely case) and suppose that an instance is set up like this:

int main(void)
{
  struct foo S;
  uint32_t x = 3;
  uint16_t y = 7;
  memcpy(&S.a, &x, sizeof(x));
  memcpy(&S.b, &y, sizeof(y));
  ...
}

The C standard (6.2.6.1-6) says that "padding bytes take unspecified values when a value is stored in a structure", but in this case I assume that the padding bytes of S are uninitialized.

The question is if the uninitialized padding bytes may lead to undefined behavior when using the structure subsequently? - Or, equivalently, if a compliant compiler is allowed to rely on the values of padding bytes?

14
  • 3
    Uninitialized padding bytes will have "unspecified values". There is no contradiction there. Commented Oct 28 at 10:45
  • You could use a straight assignment instead of memcpy. Commented Oct 28 at 10:57
  • 1
    There being objects or parts of objects that have indeterminate values is relatively common and has no particular relevance by itself. Using such indeterminate values is a different story, but there's nothing particularly special in such a case about the indeterminancy being associated with structure padding. Commented Oct 28 at 11:54
  • 1
    I personally can understand the question as-is, but I think making "when using the structure (later)" more explicit in the question can help resolve some confusions shown in the comments. Commented Oct 29 at 8:45
  • 1
    @WeijunZhou Agree, I have tried to update accordingly. Commented Oct 29 at 8:49

1 Answer 1

7

The bytes are uninitialized

The C standard (6.2.6.1-6) says that "padding bytes take unspecified values when a value is stored in a structure", but in this case I assume that the padding bytes of S are uninitialized.

That appears to be a quote from C 2018 or earlier. However, the current C standard is 2024, and it adds a parenthesized phrase:

When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values (e.g. structure and union assignment can fail to copy any padding bits).

Given the added phrase, we see this sentence is entirely irrelevant to the code in the question, as the two memcpy calls explicitly copy specific bytes of the structure and therefore explicitly do not copy padding bytes. So the code explicitly performs what this paragraph tells us what could happen if we actually did a structure assignment instead of individual byte copying. (Also note the new phrase likely intended to refer to padding bytes rather than padding bits. Although the padding bytes are of course composed of bits, “padding bits” is otherwise used in the standard only for the unused bits that may be within an integer object, not the unused bytes that may be between structure elements.)

Therefore, the situation the code in the question presents us with is a structure whose member bytes have been initialized and whose padding bytes are uninitialized.

These uninitialized bytes do not cause undefined behavior

The question is if this may lead to undefined behavior?

This is vague, as anything could lead to undefined behavior. For example, having the value 1 in an int x could lead to undefined behavior in the code if (x == 1) RoutineWithUndefinedBehavior();. I will consider whether the fact that the padding bytes are uninitialized can be a specific cause of undefined behavior.

A particular concern about uninitialized data causing undefined behavior arises from this passage in C 2024 6.3.3.1, in the context of converting an lvalue to a value (that is, reading the stored value of an object to use it in an expression):

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

This does not apply to the code in the question because the memcpy calls take the addresses of structure members, and that means the structure could not have been declared register.

Unspecified bytes would not cause undefined behavior

We could instead initialize the structure members by direct assignment, S.a = x; S.b = y;. But then the sentence from 6.2.6.1 applies; it explicitly says the padding bytes take unspecified values, which is a different state than being uninitialized.

Given that the bytes of the structure would then have been initialized, even if to unspecified values, then the sentence about using an uninitialized lvalue would not apply. Further, whatever values the padding bytes do have, this sentence, which follows the one in 6.2.6.1 quoted in the question, applies:

The object representation of a structure or union object is never a non-value representation, even though the byte range corresponding to a member of the structure or union object can be a non-value representation for that member.

Thus, using the whole value of a structure (as in structure assignment) would not encounter a trap value, so undefined behavior would not arise from that.

Supplement

Consider replacing the two memcpy calls with the single assignment S.a = x;. Per the sentence in 6.2.6.1, this sets the padding bytes to unspecified values. However, it does not affect S.b. So S.b remains uninitialized, and S could have been declared register. That situation is considered in this question.

Sign up to request clarification or add additional context in comments.

13 Comments

Small note: ISO/IEC 9899:2024 which was published in October 2024 seems to be colloquially referred to as C23. For example, my C compiler accepts the flag --std=c23 but whines at --std=c24
I do not support colloquial terms that are inferior to official terms. The standard is labeled 9899:2024, and the standard was published in 2024, so that is correct. Compilers should be fixed to use the correct label, and GCC and Clang are likely to be, as they have been before. (They started with the speculative c17 and, after the new version became official in 2018, were updated to accept c18.) Engineers should work to improve precision and accuracy.
That’s nice, but it is still a personal opinion, not an official statement. I would prefer that compilers use switches explicitly marked speculative/tentative, such as -std=c23x, and then later, when they support the official standard, add a switch for that, -std=c24. That would make it clear which switch does what. (Or, even better, std=C2024.) As it stands, I am aware of the various terminology, and I use “C 2024” to refer to the 2024 C standard. I do not use either “C23” or “C24”.
It seems that the consensus is it is called c23. The fact that you don't like that and I think it is stupid too is of no consequence.
I guess GCC had a -std=c2x flag at some stage for speculative/tentative options, as it now has a -std=c2y flag.
Thank you for this answer (and, yes, I meant UB arising specifically from uninitialized padding). I am exactly concerned about the fact that "uninitialized" is a different state than "unspecified". Imagine a compiler loading the 4 last bytes of this structure into a 32-bit register assuming the padding bytes to be 0 (which it may assume(?) if it is constructed to always initialize the padding to 0 and always include the padding in assignment). In that case, there would be a problem if the padding were uninitialized. I am still not entirely sure if this case is ruled out by the Standard.
A compiler may not assume the padding bytes are zero or remain zero. A program is free to change the padding bytes (by way of memcpy, by reading bytes from a stream into the structure, by changing a member in a union of which the structure is also a member, and possibly by other methods), and that should not disrupt use of the structure for the values of its members.
But is that statement based on an intuitive understanding or based on the definitions in the Standard? For example, we cannot uncritically read bytes from a stream into a structure. At least we need to ensure that the stream has been created with the same amounts of padding and the same value representations as the running program is using. Otherwise, a conversion must be performed first. I have always, as I also understand your statement, assumed that the value of padding bytes could be safely ignored, but started wondering if the Standard allows for a compiler where it is not true.
This seems to be postulating that compilers might be buggy. Indeed, they might, and that might take any number of forms. The spec does not provide any basis for a compiler to make universal assumptions about the values of padding bytes in structures, so if one does make such an assumption without doing everything necessary to back it up (if that were even possible) then that's on the compiler. Unless it comes from some other source, UB does not provide an out for the compiler in this area.
The parenthesized phrase contains "padding bits", while the text before the parenthesized phrase contains "padding bytes". Any ideas, why "padding bits" instead of "padding bytes"?
This is addressed in the answer.
@EricPostpischil OK. (The "within within" perhaps needs to be "within".)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.