4

I am trying to wrap my head around the implicit lifetime and aliasing rules in C++.

The standard says:

Some operations are described as implicitly creating objects within a specified region of storage. For each operation that is specified as implicitly creating objects, that operation implicitly creates and starts the lifetime of zero or more objects of implicit-lifetime types in its specified region of storage if doing so would result in the program having defined behavior.

And:

An operation that begins the lifetime of an array of unsigned char or std::byte implicitly creates objects within the region of storage occupied by the array.

As well as providing an example that states:

// The call to std::malloc implicitly creates an object of type X

Does that mean that the following is legal and correct?

constexpr size_t N = 10;
constexpr size_t S = sizeof(uint32_t);
std::vector<std::byte> buffer;
buffer.resize(N * S);

for (size_t i = 0; i < N * S; i += S)
  *reinterpret_cast<uint32_t*>(&buffer.data()[i + 2 * S]) = 42;

uint32_t x;
for (size_t i = 0; i < S; ++i)
  *(reinterpret_cast<std::byte*>(&x) + i) = buffer[i];
assert(x == 42);

If not, what am I missing? And is there a way to make it legal using the C++23 subset that LLVM 17 (Clang and libc++) supports?

Note: Even though I tagged the question as "language lawyer", I myself am not one, so I would very much appreciate an explanation in as "simple" terms as possible.

3
  • vector<byte> does not automatically create uint32_t objects in the allocated memory. Only malloc-like allocations guarantee this. You are probably looking for std::start_lifetime_as. Commented Feb 1 at 6:30
  • Why i + 2 * S? Gratuitously indexing out of range detracts from your example. (When i is (N-1)*S in the last iteration, the index is (N*S - S) + 2*S or N*S + S, which is larger than N*S.) Not to mention that skipping the first 2*S elements of buffer easily causes your assert to fail. Commented Feb 1 at 7:21
  • You haven't made sure that your storage is properly aligned and using a vector<byte> does not implicitly start the lifetime of whatever objects you are planning to use it for. Commented Feb 1 at 8:15

2 Answers 2

3
*reinterpret_cast<uint32_t*>(&buffer.data()[i + 2 * S]) = 42;

This is undefined behavior because you forgot std::launder. Without std::launder, you are accessing a std::byte through a glvalue of type uint32_t, which is UB because std::byte is not type-accessible through uin32_t. It doesn't matter whether a uint32_t exists in those bytes; reinterpret_cast without laundering doesn't give you a pointer to it.

As for whether implicit object creation takes place there: std::vector<std::byte> has to maintain an array of bytes internally ([vector.data] implies this), but not necessarily one where implicit objects for you are created. If std::vector simply allocated some bytes (with std::allocator, operator new, this is guaranteed) and gave you a pointer, then you could obviously use the implicitly created objects there, but it might do a lot more.

For example, it could do placement-new for each individual byte when setting them to zero, which would end the lifetime of any implicit uint32_t in the same place. You're at best relying on implementation details of std::vector with this code.

If not, what am I missing? And is there a way to make it legal using the C++23 subset that LLVM 17 (Clang and libc++) supports?

Yes, but ideally, don't use std::vector if you need storage for implicitly created objects. Use something like std::unique_ptr<std::byte[]>:

// obtain uninitialized, dynamically allocated byte[]
// objects are implicitly created inside (see [intro.object])
std::unique_ptr<std::byte[]> buffer
  = std::make_unique_for_overwrite<std::byte[]>(N * S);

for (size_t i = 0; i < N * S; i += S) {
  // obtain a pointer to the byte where the uin32_t is stored
  std::byte* byte = buffer.get() + i * S;
  // obtain a pointer to the uint32_t
  uint32_t* uint = std::launder(reinterpret_cast<uint32_t*>(byte));
  // overwrite its value with 42
  *uint = 42;
}

This code is still highly questionable because you could have just allocated a uint32_t[] in the first place. If all objects in your buffer have the same type, you could also simplify this code by doing:

uint32_t* integers = std::launder(reinterpret_cast<uint32_t*>(buffer.get()));
for (size_t i = 0; i < N; ++i) { // or use std::fill
  integers[i] = 42;
}

Note on alignment

Both the std::vector case and the std::unique_ptr case are "fine in practice" in terms of alignment. [basic.stc.dynamic.allocation] explains that for operator new:

the storage is aligned for any object that does not have new-extended alignment

i.e. you get the minimum guaranteed alignment of __STDCPP_DEFAULT_NEW_ALIGNMENT__, and this is going to be at least alignof(void*) and maybe alignof(max_align_t) in any sane implementation.

Sign up to request clarification or add additional context in comments.

6 Comments

"which is UB because" but isn't "if doing so would result in the program having defined behavior" implying that it would be UB without implicit creation? In other words, this would be precisely when implicit creation is supposed to work?
It's UB regardless of implicit object creation because reinterpret_cast wouldn't obtain a pointer to the uint32_t object, even if it was implicitly created there. Since implicit object creation wouldn't make the program well-defined, it doesn't take place at all. That's what that wording means. This kind of wording resolves questions like "if I treat this byte array as storing an implicitly created int and implicitly created float, that is UB because it violates strict aliasing, but which type of object was actually created there?". The answer is neither, it doesn't matter, it's UB.
@JanSchultke the code was just an example, the intent is to to treat the byte buffer as if it contained various types at different types.
Everything in the answer still applies generally. You wouldn't use std::vector for that and need to be careful about laundering.
@JanSchultke: Does using std::launder actually help anything in practice? I would think that in my example the pointer value that was stored in ul1 should have been sufficiently laundered by the time code read *ul5 to block consolidation of the load of *ul5 with the store of *ul1, but neither clang nor gcc seems to think so.
|
0

There has never been a consensus understanding as to how some of the rules in the C and C++ Standards are supposed to work--at least not one that is compatible with the way clang and gcc actually process programs.

In the following code, if p points to a region of storage which can hold either an 8-byte long or an 8-byte long long, and if i, j, and k are all zero, the storage at address p would would never be read using any type other than the one with which it was last written, and any pointer that is used to access storage using any type would be laundered before the next time it is used to write the storage using a new type. If the implicit creation rules are ever supposed to be useful, they should be usable here, where code goes out of its way to make it clear that storage is getting recycled for use as different types.

#include <cstddef>
#include <new>
long test(void *p, int i, int j, int k)
{
    long long temp;

    long* ul1 = std::launder(reinterpret_cast<long*>(p));
    ul1[i] = 1234;
    // Contents of storage at ul1[i] will never again be used
    long long* ull2 = std::launder(reinterpret_cast<long long*>(ul1));
    ull2[j] = 2345;
    temp = ull2[k];
    // Contents of storage at ull2[j] will never again be used
    long* ul4 = std::launder(reinterpret_cast<long*>(ull2));
    ul4[k] = 3456;
    ul4[k] = temp;
    return ul4[i];
}

Unfortunately, neither clang nor gcc will correctly process this code unless invoked with -fno-strict-aliasing, and in that mode they relax the type-based access constraints in the C and C++ Standard, rendering them moot.

9 Comments

I'm not sure what you're trying to demonstrate, and this example seems way bigger than it needs to be. Also keep in mind that if i, j, and k are all zero, this code has UB because you're writing a long at p + 0 and then laundering it as long long at p + 0 in the next block, which violates the preconditions of std::launder.
@JanSchultke: If all indices are zero, the write via *ul1 should create an object of type long, whose value will never be read in the program as written. The write via ull2 should destroy that object and create one of type long long. The read via ull3 should read that object. The first write via ull4 should destroy that object and create one of type long. The second write should update the value of that object. The final read via ull5 should read the object of type long that was created by the first write via ul4 and updated by the second write.
@JanSchultke: By my understanding, lifetime of the objects created by writing ul1 and ull2 last until the storage of those objects is overwritten. If a trivial object's lifetime can't be ended by using a laundered pointer to overwrite the storage holding it with a trivial object of a new type, how is one supposed to end the lifetime of such objects and make the storage available for reuse?
@JanSchultke: I just simplified the code and confirmed the behavior of the simplified code remains the same in both clang and gcc; a previous attempt at simplification had a mistake. References to *ul1 in the comment above are now ul1[i]; *ull2 and *ull3 in the comment above should be ull2[j] and ull2[k], respectively; references to *ul4 and *ul5 are now ul4[k] and ul4[i].
In this new code, if i is zero, then the first std::launder would require that there is a long stored at p, and the second std::launder would require that a long long is stored at the same address. This cannot be the case and this code has UB.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.