6

I was confused by the following paragraph about type aliasing from cppreference (source):

Whenever an attempt is made to read or modify the stored value of an object of type DynamicType through a glvalue of type AliasedType, the behavior is undefined unless one of the following is true:

  • AliasedType and DynamicType are similar.
  • AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
  • AliasedType is std::byte, char, or unsigned char: this permits examination of the object representation of any object as an array of bytes.

Consider I have an object of a trivial type (such as a scalar) whose size is larger than 1 byte. In what ways (if at all), am I allowed to modify the byte representation of the object through a pointer to a different type without invoking undefined behaviour? For example:

int x = 5, y = 10;
std::byte* x_bytes = reinterpret_cast<std::byte*>(&x);

//#1: replacing the entire representation:
std::memcpy(x_bytes, &y, sizeof(int));

//#2: changing a random byte in the representation:
x_bytes[0] = static_cast<std::byte>(3);

Are both of these operations allowed, or only #1?
The problem is that I don't know how to interpret the paragraph I quoted. The three bullets are exceptions to the rule that "Whenever an attempt is made to read or modify the stored value [...] the behavior is undefined", which would imply that both reading and writing are allowed if one of the bullets is applicable. However, the third bullet only mentions the "examination of the object representation", which implies read-only access.
I tried to find an appropriate standard page describing this problem in more detail, but I haven't been able to, so this was all I had that was relevant to the problem.

2 Answers 2

5

Are both of these operations allowed

Yes. There is no rule saying that you must modify all or nothing. Modifying a single byte is allowed.


However, the third bullet only mentions the "examination of the object representation", which implies read-only access.

The standard rule doesn't use such wording. This is the rule from the latest draft:

[basic.lval]

If a program attempts to access the stored value of an object through a glvalue whose type is not similar to one of the following types the behavior is undefined:

  • the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • a char, unsigned char, or std​::​byte type.

Access is defined as:

[defns.access]

⟨execution-time action⟩ read or modify the value of an object


Of course, modifying bytes by their index-order is quite dubious from portability perspective, since different systems store their bytes in different orders, and thus you would be modifying a byte with different order of significance on different systems.

Different behaviour on different systems is often undesirable.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you very much for your answer, especially for linking the standard page.
There are also [basic.life]/1.5 and /5 which talk about reusing the storage; work is underway to establish that the bytes act like subobjects so as not to end the lifetime of the ordinary object.
@DavisHerring the paper you've linked also explicitly says that in-place modification of the object representation is still undefined behavior (in revision 5).
3

You are not allowed to arbitrarily modify an object through a std::byte*. In your example, #1 is okay, but #2 is undefined behavior.

Firstly, note that any object is type-accessible through a glvalue of type std::byte (the wording is slightly different in C++20, but has the same meaning). This means that you can access any object through a std::byte*, such as x_bytes in your example, and "access" means both reading and modifying by definition.

However, the effect of doing this is completely undefined by the standard, with some exceptions. Notably, [basic.types.general] p2 says that you can copy the underlying bytes of a trivially copyable type into a byte array and back, and this retains the original value. In your example, std::memcpy(x_bytes, &y, sizeof(int)); is therefore required to work exactly like x = y;.

However, the standard never defines what happens when you modify an object through a std::byte* or what value you obtain when reading an object through a std::byte*. Therefore:

  • Modifying an object through std::byte* is undefined behavior by omission.
  • Accessing individual bytes through std::byte* is at best giving you an unspecified value, and is at worst, undefined behavior by omission.

Also note that reinterpret_cast doesn't give you a pointer to the "underlying bytes" (i.e. the object representation). It's still a pointer to the original object.

Related work

Note that the C++ object model is currently highly defective. Many of the issues should be resolved by P1839: Accessing object representations, which makes major changes to object representations.

The paper also states (see Non-goals):

This paper does not propose to make in-place modification of the object representation valid, i.e. writing into the underlying bytes, only reading them.

Also see the Known issues section in the paper.

4 Comments

Your point is that as it is not explicitly defined, we must considered its UB? (which would make sense) Am I correct, should one read the standard like this (any operation that is not explicitly described is then UB?)
"Also note that reinterpret_cast doesn't give you a pointer to the "underlying bytes" (i.e. the object representation). It's still a pointer to the original object." Could you back it up with standard wording?
Anything that the standard doesn't define semantics for is UB by omission. The semantics in eel.is/c++draft/expr.static.cast#12 tell you that you get the original pointer value out of the reinterpret_cast (which is just casting to void* and then to T*). The effect of modifyng the object through that std::byte* or advancing the pointer through addition is a hole in the wording. To be fair, this is more of a defect than intentional UB, and it's intended to work for trivially copyable types, even though the wording doesn't say that.
Thanks for your clear answer. I'm tending to agree with you and not eerorika. The sections is citing are relevant but, as you're stating, not sufficient. This hole does not seem to be on the track for a patch though. Perhaps because the compilers so far behave as expected (bad reason) and because you can do byte fiddling through bit_cast (good reason) and hoping for compiler to optimize (bad reason, but not so bad). So your trading a hypothetical UB for a hypothetical miss-optimization.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.