0

P.S I am aware that memcpy should not be used to copy data to an overlapping memory address, and that memmove should be used instead.

Based on my understanding of memcpy, the function basically copies each byte from the source address to the destination address sequentially for a specified number of bytes. So in theory, given the code:

int main(void){
    int arr[] = {1, 2, 3, 0, 0};
    memcpy(arr + 1, arr, 3 * sizeof(arr[0]));
    return 0;
}

Shouldn't the result be arr = {1, 1, 1, 1, 0} since arr[0] is copied to the next location, which is copied again etc. leading to all 1s?

The actual output, however, is arr = {1, 1, 2, 3, 0}; where the elements seems to be copied properly. Why does this happen? My best guess is that memcpy uses some sort of buffer to hold a copy of elements it is copying, and copies each byte in the buffer instead of the updated array to avoid corrupting subsequent values.

I also tried copying longer portions of arrays to see if something like an 8-byte buffer exist:

int arr = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 0, 0, 0, 0, 0, 0, 0, 0};
memcpy(arr + 8, arr, 16*sizeof(arr[0]));

And the output is still coherent with the previous example: arr = {1, 2, 3, 4, 5, 6, 7, 8, 1, 2....15, 16};

4
  • My best guess is that memcpy uses some sort of buffer to hold a copy of elements it is copying, and copies each byte in the buffer instead of the updated array to avoid corrupting subsequent values. No, memcpy will allow corruption (because it copies bytes incrementing from low to high addresses) (i.e. copy is left-to-right). memmove starts from the end (highest) addresses (i.e. ptr + len - 1) and then decrements the pointers (i.e. copy is right-to-left). It "does the right thing" and does not need an extra buffer. Commented Dec 9, 2024 at 2:53
  • memmove will always work, but it may be slower than memcpy, depending upon the implementation. So, if caller is sure there is no overlap, memcpy is preferred (for speed and better cache performance because the cache H/W assumes incrementing addresses if doing anticipatory/speculative prefetch). Because memmove works backwards, this tends to [possibly] confuse/defeat the prefetch Commented Dec 9, 2024 at 3:00
  • @CraigEstey In case the destination is at a lower address than the source (e.g. clearing lines in a Tetris clone), memmove may well work forwards instead. Commented Dec 9, 2024 at 5:35
  • If we ignore undefined behavior for the sake of discussion, library versions of memcpy often use word-based copying. So if on a 64 bit machine it might do stuff on 8 byte basis, given that the data is aligned. Commented Dec 9, 2024 at 7:39

4 Answers 4

7

The behavior of applying memcpy to overlapping addresses is undefined. The compiler and the C standard library are allowed to do whatever they want when you gave them such addresses, as long as the operating system allows it. This includes replacing such calls with memmove calls, producing weirder outputs than what you think with highly optimized memcpy implementations, and crashing your program. Furthermore, the compiler is allowed to do different things with undefined behavior depending on the options you give, like -O0 or -O2.

Sign up to request clarification or add additional context in comments.

1 Comment

A minor clarification: it's allowed to do whatever it wants with undefined behavior no matter the optimization level. It's just a bit more likely to do something truly weird at the higher levels. (Otherwise, spot on.)
3

It's unspecified exactly how it does the copy (and undefined if they overlap), so the compiler can produce whatever assembly is fastest. That could be front to back or back to front. It might involve pulling the destination byte into a register before clobbering it, or even using the exact same code for memcpy and memmove to minimize the instruction cache. There are also specific vector operations on some processors that operate across a range of bytes in parallel. (I'd be somewhat surprised if it didn't use those when available.)

If you're curious about how your particular compiler is doing it for this particular code (nothing says it can't compile it differently somewhere else), some compilers have ways to see the generated assembly. For gcc, it's gcc -S foo.c.

2 Comments

Compiling to assembler code only works if memcpy code is inlined. Otherwise it will not be included in the output created from foo.c
@Gerhardh True. I was suspecting that it might be, but if it isn't, you'd need to track down the libc source and compile that (or disassemble the appropriate part of libc.so). But the main reason I suggested it was to see if it's compiling the call either to a couple of inline vector ops or a call to memset, so if that's not the case, it's probably not worth bothering with.
0

Based on my understanding of memcpy, the function basically copies each byte from the source address to the destination address sequentially for a specified number of bytes.

WHAT void *memcpy(void * restrict s1, const void * restrict s2, size_t n) does is this

The memcpy function copies n characters from the object pointed to by s2 into the object pointed to by s1. If copying takes place between objects that overlap, the behavior is undefined.

HOW it does it depends on the implementation.

Comments

0

This is the signature of the function:

void *memcpy(void * restrict dest, const void * restrict src, size_t n);

Note the restrict keyword. This keyword guarantees that the two memory locations are not overlapping. It enables better optimization. If the regions overlap anyway, it is undefined behavior.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.