12

Excerpt from the book "C++ memory management" by Patrice Roy

The std::memcpy() function

For historical (and C compatibility) reasons, std::memcpy() is special as it can start the lifetime of an object if used appropriately. An incorrect use of std::memcpy() for type punning would be as follows:

// suppose this holds for this example
static_assert(sizeof(int) == sizeof(float));
#include <cassert>
#include <cstdlib>
#include <cstring>
int main() {
   float f = 1.5f;
    void *p = malloc(sizeof f);
    assert(p);
    int *q = std::memcpy(p, &f, sizeof f);
    int value = *q; // UB
    //
}

The reason why this is illegal is that the call to std::memcpy() copies a float object into the storage pointed to by p, effectively starting the lifetime of a float object in that storage. Since q is an int*, dereferencing it is UB.

Is it really UB here ? Doesn't malloc() implicitly create an int (if sizeof(int) == sizeof(float), of course) and we can legally do memcpy to it and than read it value.

12
  • 5
    std::memcpy returns a void* and assigning that to int* is a compiler error. GCC says, "error: invalid conversion from 'void*' to 'int*' [-fpermissive]" Commented May 30 at 22:31
  • 1
    @JeffGarrett "malloc (or memcpy) implicitly creates an int" - not according to the standard, they don't. In this case, they are both implicitly creating a float, as that is how the program is preparing the allocated memory. The program is not allocating or copying an int, it is allocating and copying a float. So the implicit object is a float, not an int. Simply pointing an int* pointer at the allocated memory does not magically turn the implicit float object into an int object. Commented May 31 at 0:00
  • 1
    Maybe you can tag language-lawyer. I am curious which of the 2 answers is actually correct. I don't like the last part of "that operation implicitly creates and starts the lifetime of zero or more objects of implicit-lifetime types in its specified region of storage if doing so would result in the program having defined behavior". Can memcpy actually consider static_cast<int*> to decide which object type to create? Cast of returned value is already out of scope of memcpy function. Commented May 31 at 10:54
  • 3
    @ChristianStieber the snippet came from a book, and the book is saying something poster thinks is wrong, so poster asks the q. Commented May 31 at 14:47
  • 2
    @pptaszni It isn't "considering static_cast<int*>" .... usually a cast doesn't itself generate UB, it's the dereference. So it's the *q that must be considered. And yea, the point of implicit lifetime types was to give something within C++'s object model that looks a bit like C's weaker model: if it looks like an int, there must have been an int there. See the original paper: open-std.org/jtc1/sc22/wg21/docs/papers/2020/p0593r6.html Commented May 31 at 14:52

3 Answers 3

12

This is not UB.

Presumably this is not controversial:

 float f = 1.5f;
 void *p = malloc(sizeof f);
 assert(p);

This line becomes interesting:

int *q = (int*)std::memcpy(p, &f, sizeof f);

I've inserted the missing cast from void* to int*.

std::memcpy implicitly creates objects in the destination region of storage before the copy and returns the new object: cstring.syn

What objects does it create? Any objects of implicit lifetime types which give the program defined behavior. intro.object If there are many choices, it is unspecified which you get.

Then consider:

 int value = *q; // Obviously NOT UB

This is defined so long as the object created was an int, so it must have been.

Sign up to request clarification or add additional context in comments.

2 Comments

Why is malloc not qualified? That alone is UB.
This is defined What is the value of value?
0

Let me start by a different point of failure in the OP:

 void *p = malloc(sizeof f);

Since it's being compilded as C++ code, this snnipet produces a very common type of UB. Standard C functions may or may not be present in global namespace when standard C++ headers are included. So, the program may or may not compile. Even if it compiles, there's no guarantee that the behavior matches that of either of C or C++ specifications. Preprocessor may act in unpredicted ways. Standard identifiers shared between C and C++, must always be qualified as std::, even if a using directive or declaration has already been introduced:

auto p = std::malloc(sizeof f);
assert(p);

Having solved this not so-obvious problem, let's break the focal point of the question down:

int *q = std::memcpy(p, &f, sizeof f);

AFAIK, the above line always fails at compile-time - due to the simple fact that void* is not implicitly convertible to any other type than its own cv-qualified forms. So, an explicit casting syntax is required. And the simplest working one is static_cast:

auto q = static_cast<int*>
( std::memcpy(p, &f, sizeof f) );

My preference is to automatically deduce the type of result after an explicit cast to avoid repetion and accidental invocation of an extra implicit cast. But that's not a strict rule because of other possibilities for error.

Now I further replace the single-line code above with a roughly equivalent snippet:

std::memcpy(p, &f, sizeof f);
auto q = static_cast<int*>(p);

Because of previously:

 float f = 1.5f;

We know that p points to a valid instance of float which is not a CV-qualified form of int and being a fundamental type, has no inheritance/refinement relationship with int. That means the cast violates strict alaising rule, hence invokes UB.

std::memcpy can only legally circumvent strict alaising for its parameters, but it has a restriction of std::is_trivially_copyable_v<S> on the source, and a std::is_trivially_copyable_v<D> and std::is_trivially_default_constructible_v<D> on its destination; for fundamental types, the above constraints are satisfied by default. The return value is only a copy of the destination pointer.

In order to use the return type as a different implicit life-time type, you need a lifetime cast:

auto q = std::start_lifetime_as<int>
( std::memcpy(p, &f, sizeof f) );

Now q points to in int object, but as long as q is active, p should not be used on the same object.

A simpler approach would be to use a static_cast on std::malloc, before std::memcpy:

auto p = static_cast<int*>
( std::malloc(sizeof f) );
assert(p);
std::memcpy(p, &f, sizeof f);
auto value = *p;

Now it's possible to apply the static_cast to the output of std::memcpy. But what's the point when p had the correct type? The above snippet is correct, because std::malloc is assumed(by the standard) to start the lifetime of implicit life-time types - which covers are fundamental types.

But if all you want is type punning, it's much easier in C++20:

#include <bit>
auto constexpr value = std::bit_cast<int>(1.5f);

The requirements are similar to what was mentioned for source and destination of std::memcpy: source and destination must be trivial types with same size in memory.

As you can see, std::bit_cast can be evaluated at compile-time too. So you don't need an external code generator to compute magic numbers as the bit representation of instances of source type. It's simple, easy to use, and readable. Explicit use of std::mem* functions is not encouraged anymore. This low-level utilities are meant to be buried behind well-tested expertly-written libraries. All their use cases are already covered by other C++ features.

7 Comments

"Even if it compiles, there's no guarantee that the behavior matches that of either of C or C++ specifications." why? the point that <cstring> may not define ::malloc is true, in which case the program is ill-formed. In either case it matches the spec, so I don't know what is being said here. "Preprocessor may act in unpredicted ways" what ways?
"Standard identifiers shared between C and C++, must always be qualified as std::" that is most definitely a should, not a must. It is possible to write correct code violating that, and it is sometimes imperative that that be violated (when code should compile with C or C++). "We know that p points to a valid instance of float" that is not true (it points at an int), if you try to justify that you'll find the flaw "hence invokes UB" - which negates this.
"only legally circumvent strict aliasing for its parameters" - memcpy cannot circumvent strict aliasing, it allows one to create new objects and copy object representations. "you need a lifetime cast" - the point of implicit lifetime types was that you do not need this. "Now it's possible to apply the static_cast to the output of std::memcpy" - this doesn't change anything, the cast doesn't make it suddenly implicitly start an int's lifetime, it was capable of doing that already.
@JeffGarrett you don't have to complain about the whole answer, if you don't like it. I don't want to repeat that long answer here. But one note for short: never use standard C functions unqualified in C++ code, you'll shoot yourself in wierd places. It's different with 3rd party libraries, if you know what you are doing. But standard functions need caution. try printing abs(0ull-1) with and without std:: prefix.
My apologies. Consider them as offered in the spirit of suggestions not complaints.
|
0

I think the book is correct. The std::malloc doesn't implicitly create a int object because it is returning a void * that is being assigned to a void *. An explicit cast (int *) of the pointer returned by std::malloc and assigning that to an int *, would typically be required in order to implicitly create an int object. Similarly, the std::memcpy would implicitly create a float object at the location pointed to by p, since, p is a void * and the source (f) is a float. Remy Lebeau's comment under the question also discusses this aspect. This behavior of the malloc and the memcpy seems much more plausible than the object that p points to being implicitly created as an int by memcpy in order to avoid UB while dereferencing q, which is a pointer to int that points to the same memory region.

The assignment of the void * returned by std::memcpy to an int * is indeed a syntax error. We can treat it as a minor syntax error maded by the book and not a conceptual error. Thus, let us assume that the statement is of the form, int *q = (int *)std::memcpy(p, &f, sizeof f);.

Now, even though this casting and assignment is allowed, dereferencing q is undefined. This is because of type-accessibility restrictions. As long as the underlying object is of type float, it is UB to access it via a pointer to an integer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.