2

Can std::memmove() be used to "move" the memory to the same location to be able to alias it using different types?

For example:

#include <cstring>
#include <cstdint>
#include <iomanip>
#include <iostream>

struct Parts { std::uint16_t v[2u]; };

static_assert(sizeof(Parts) == sizeof(std::uint32_t), "");
static_assert(alignof(Parts) <= alignof(std::uint32_t), "");

int main() {
    std::uint32_t u = 0xdeadbeef;

    Parts * p = reinterpret_cast<Parts *>(std::memmove(&u, &u, sizeof(u)));

    std::cout << std::hex << u << " ~> "
              << p->v[0] << ", " << p->v[1] << std::endl;
}
$ g++-10.2.0 -Wall -Wextra test.cpp -o test -O2 -ggdb -fsanitize=address,undefined -std=c++20 && ./test
deadbeef ~> beef, dead

Is this a safe approach? What are the caveats? Can static_cast be used instead of reinterpret_cast here?

7
  • 2
    You still don't have a proper Parts object. The portable approach to create a trivial object via memory representation is to have a Parts p; and then memcpy to &p. memmove is irrelevant here. Commented Apr 8, 2021 at 13:56
  • 1
    I'm pretty sure this is undefined behavior, with or without memmove. The code accesses Parts object whose lifetime has never started. I don't see how memmove changes that. Commented Apr 8, 2021 at 13:56
  • @IgorTandetnik But isn't struct Parts an implicit-lifetime type which is created by memmove? Commented Apr 8, 2021 at 13:59
  • 1
    There is no struct Parts object in the code example. There's a std::uint32_t. There's a struct Parts*, which points to an object that is not a struct Parts. Commented Apr 8, 2021 at 14:07
  • 1
    FYI C++20 introduced std::bit_cast as a safe, convenient way to do this. The cppreference page has an example implementation that you can use if your compiler's not providing it yet (due in GCC 11 FWIW). Commented Apr 8, 2021 at 15:04

3 Answers 3

2

I don't think the proposal in the question is well-defined according to the C++ standard. The reinterpret_casts are outside the well-defined use cases, and I don't think the std::memmove does (or can) resolve that.

My simplistic way of thinking about the strict aliasing rule is that the only way to comply is to copy bytes into the representation of an already created object of the target type.

std::uint32_t u = 0xdeadbeef;
Parts p;  // p is now a valid object
std::memcpy(&p, &u, sizeof(p));

C++20 added std::bit_cast, which more clearly signals the intent.

std::uint32_t u = 0xdeadbeef;
Parts p = std::bit_cast<Parts>(u);

Furthermore, the compiler likely has a deep understanding of std::bit_cast, so it may be able to optimize it better than if you did it manually.

It's not clear to me whether it's legal to use std::bit_cast to make a pointer that treats the same bytes as though they're a representation of a different type.

// Is this legal?  It feels wrong.
std::uint32_t u = 0xdeadbeef;
Parts *pp = std::bit_cast<Parts *>(&u);

That said, other forms of type-punning are allowed in C, and those forms are used in lots of existing code. So C++ compilers will generally make those work even though they technically violate the strict aliasing rule.

I've heard members of the C++ standards committee say that this is a difficult but important contradiction to resolve. More recent additions, like std::launder and std::start_lifetime, may be efforts to do that, but I haven't fully grokked them yet.

Sign up to request clarification or add additional context in comments.

Comments

0

If one is interested in knowing what constructs will be reliably processed by the clang and gcc optimizers in the absence of the -fno-strict-aliasing, rather than assuming that everything defined by the Standard will be processed meaningfully, both clang and gcc will sometimes ignore changes to active/effective types made by operations or sequences of operations that would not affect the bit pattern in a region of storage.

As an example:

#include <limits.h>
#include <string.h>

#if LONG_MAX == LLONG_MAX
typedef long long longish;
#elif LONG_MAX == INT_MAX
typedef int longish;
#endif

__attribute((noinline))
long test(long *p, int index, int index2, int index3)
{
    if (sizeof (long) != sizeof (longish))
        return -1;

    p[index] = 1;
    ((longish*)p)[index2] = 2;

    longish temp2 = ((longish*)p)[index3];
    p[index3] = 5; // This should modify p[index3] and set its active/effective type
    p[index3] = temp2; // Shouldn't (but seems to) reset effective type to longish

    long temp3;
    memmove(&temp3, p+index3, sizeof (long));
    memmove(p+index3, &temp3, sizeof (long));
    return p[index];
}
#include <stdio.h>
int main(void)
{
    long arr[1] = {0};
    long temp = test(arr, 0, 0, 0);
    printf("%ld should equal %ld\n", temp, arr[0]);
}

While gcc happens to process this code correctly on 32-bit ARM (even if one uses flag -mcpu=cortex-m3 to avoid the call to memmove), clang processes it incorrectly on both platforms. Interestingly, while clang makes no attempt to reload p[index], gcc does reload it on both platforms, but the code for test on x64 is:

test(long*, int, int, int):
        movsx   rsi, esi
        movsx   rdx, edx
        lea     rax, [rdi+rsi*8]
        mov     QWORD PTR [rax], 1
        mov     rax, QWORD PTR [rax]
        mov     QWORD PTR [rdi+rdx*8], 2
        ret

This code writes the value 1 to p[index1], then reads p[index1], stores 2 to p[index2], and returns the value just read from p[index1].

It's possible that memmove will scrub active/effective type on all implementations that correctly handle all of the corner cases mandated by the Standards, but it's not necessary on the -fno-strict-aliasing dialects of clang and gcc, and it's insufficient on the -fstrict-aliasing dialects processed by those compilers.

Comments

-1

You should be able to do it like this:

int main() {
   union {
      std::uint32_t u;
      Parts p;
   };
   u = 0xdeadbeef;
   std::memmove(&p, &u, sizeof(u));

   std::cout << std::hex << u << " ~> "
             << p.v[0] << ", " << p.v[1] << std::endl;
}

https://en.cppreference.com/w/cpp/string/byte/memmove#:~:text=Where%20strict%20aliasing%20prohibits%20examining,used%20to%20convert%20the%20values.

Edit 1: Some commenter doesn't agree with this, so I wish to elaborate on CPP reference, which states:

"The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest."

This means that this is equivalent to:

int main() {
   union {
      std::uint32_t u;
      Parts p;
   };
   u = 0xdeadbeef;
   char tmp[sizeof(u)];
   std::memcpy(&tmp[0], &u, sizeof(u)); // access to u, p not active
   std::memcpy(&p, &tmp[0], sizeof(p)); // access to p, u not active
   // From here on it is ok to access p
   std::cout << std::hex << u << " ~> "
             << p.v[0] << ", " << p.v[1] << std::endl;
}

This is perfectly legal for trivially copiable objects, and assuming the value copied in each element of Parts is not a trap representation for its value (both of which should be the case here).

The fact that this is ok is also reinforced by the following statement in the reference, which I find it rather un-equivocal for the case in question:

"Where strict aliasing prohibits examining the same memory as values of two different types, std::memmove may be used to convert the values."

Edit 2: I actually just realised that the cout line is not valid because it accesses both u and p at the same time, so this would have to be written as (for example):

int main() {
   union {
      std::uint32_t u;
      Parts p;
   };
   u = 0xdeadbeef;
   std::cout << std::hex << u << " ~> ";
   std::memmove(&p, &u, sizeof(u));
   std::cout << p.v[0] << ", " << p.v[1] << std::endl;
}

13 Comments

I don't think that's guaranteed safe, as only one member of a union is allowed to be active at any given time; that's clearly violated when using both p and u in arguments to std::memmove().
@TobySpeight I think you are wrong. The reference clearly states: "The objects may overlap: copying takes place as if the characters were copied to a temporary character array and then the characters were copied from the array to dest.". As such the 2 pointers are not active at the same time. Also "Where strict aliasing prohibits examining the same memory as values of two different types, std::memmove may be used to convert the values.".
if p and u aren't allowed to exist at the same time, it's meaningless to even ask whether they overlap or not.
@TobySpeight I clarified the text further, can you please explain, with a reference from the language specification rather than assertion, where my interpretation of the reference is wrong? It seems very unequivocal to me.
class.union.general.2: At most one of the non-static data members of an object of union type can be active at any time. Yes, it's absolutely unequivocal about that.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.