3

I'm working on legacy code that uses mmap to load data from a file:

int fd;
std::size_t fs;
fd = open(filename, O_RDONLY);  // error management omitted for the example
fs = get_size_from_fd(fd);  // legacy function using fstat
void *buff = mmap(NULL,fs,PROT_READ,MAP_SHARED,fd,0);

(NB: large parts are still using a C API, but I'm compiling in C++ and trying to update as much as possible, fixing issues and UB first).

Later in the code, I'm finding:

unsigned char *ptr = (unsigned char*)buff; // legacy code, first change would be to make it a reinterpret_cast
// then loops on bytes from ptr.

AFAIK, there is no implicit array of bytes at buff (see https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2025/p1839r7.html) so, instead I want to emulate start_lifetime_as:

unsigned char *get_object_representation(void *buffer, std::size_t N)
{
    std::memmove(buffer, buffer, N); // implicitly creates an array of unsigned char
    return std::launder(reinterpret_cast<unsigned char *>(buffer));
}

And then I could do:

unsigned char *ptr = get_object_representation(buff,fs);
// then loops on bytes from ptr

This requires C++20 for implicit lifetime creation, I believe.

But, is it correct? I've got myself an objection with the use of std::memmove because it requires that an object of size at least N exists at buffer. Yet I don't think that mmap formalizes such a guarantee.

https://man7.org/linux/man-pages/man2/mmap.2.html says only:

On success, mmap() returns a pointer to the mapped area

Otherwise, is there a proper way to access bytes accessible from mmap in a well-defined way (possibly in older C++ also)?

11
  • Since you use std::launder and expect std::memmove to start a lifetime I assume you use C++20 right? Might be worth adding as a tag. But the function itself should be ok (from C++20 onward). But maybe you should call your function "start_lifetime_as" too (in your own namespace) Commented Oct 21 at 13:28
  • Just a nit, I think you should use a static_cast here rather than a reinterpret_cast. Shouldn't make a difference in functionality. The static_cast is just narrower in what it can do. Commented Oct 21 at 13:43
  • Related (but not duplicate AFAICT): Using std::memmove to work around strict aliasing? Commented Oct 21 at 14:05
  • 1
    Isn't there a problem with std::memmove(buffer, buffer, N) when buffer is mapped read-only? If the memmove isn't optimized out, then this will segfault. Commented Nov 9 at 17:33
  • 1
    In practice, it does crash with x86-64 clang with optimizations off. godbolt.org/z/aEPTYMdnd Commented Nov 13 at 16:08

1 Answer 1

5

mmap's DESCRIPTION specifically tells us:

The contents of a file mapping ..., are initialized using length bytes starting at offset offset in the file (or other object) referred to by the file descriptor fd.

So we know there are valid bytes there, they're just not specified to be valid C++ objects. std::memmove fixes this problem (of converting bytes from any source that aren't strictly C++ objects to C++ objects; mmap is just one example where you need this functionality) because the implicit object creation at dest happens first; std::memmove is explicit that it performs its operations in order, the first of which is:

  1. Implicitly creates objects at dest.

Because dest is src, this means that the length bytes mapped from the file are now legal C++ objects, before any data is logically moved. The move itself is superfluous (and ideally optimized away, though I've seen indications that at least some versions of Microsoft's C++ compiler do not do so), the only thing you need from it is:

  1. That it creates C++ objects for length bytes beginning at an address of our choosing (dest)
  2. That it otherwise does nothing to the data

If std::memmove had the ability to receive NULL as the src argument, and guaranteed it would perform no moves in that circumstance, we'd still get all we want; it's not the reading from src or the writing to dest that matters, it's that dest is converted to objects before we even get to that point, which is enough to convert the raw bytes to C++ objects.

To be clear, this isn't specific to C++20 and higher; the change to add implicit lifetime creation to the various APIs received from C was a defect report that retroactively changed C++ standards all the way back to C++98, so std::memmove has this implicit object creation ability in any C++ standard you care to use.

Sign up to request clarification or add additional context in comments.

5 Comments

Though cppreference mentions an "order of operation" I see a limitation: object existence may still be a prerequisite. Order of operation (implicit creation, then "copy") is stated at timsong-cpp.github.io/cppwp/n4861/cstring.syn#3 but timsong-cpp.github.io/cppwp/n4861/cstring.syn#1 sends to (for instance) open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf §7.26.2.3 that states that source is the address of an object (though in C sense) and my reading is that this existence is prior to the call.
The C definition of object is much more lax and much less important than in C++; if you're trying to claim C object rules apply in C++, you may as well just discard most C APIs incorporated by reference without modification to the C++ standard, because their spec clearly wasn't written with C++ in mind. If the C++-specific rules say it creates objects first, it creates objects first.
besides I've just found also this close question: stackoverflow.com/questions/53340727/… The accepted answer is that using mmap or any memory mapping is UB in C++, though it has resulted in a heated debate.
That question is from 2018, and predates the defect report that backported the concept of implicit object creation to all C++ standards (with it being directly incorporated into C++20).
Please see Nate Eldredge's comment under the question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.