0

I've been experimenting with memcpy() implementations and optimizations in C; I recently wrote a version which was supposed to use successive powers-of-two to speed up the copy:

  • first, large 8-byte blocks are copied as uint64_t
  • the same is successively performed for uint32_t and uint16_t
  • finally, any remainder is copied bytewise

The example implementation and test main I wrote functions successfully, although at least one test case (string length = 72) appears to add two additional spurious bytes to the end of the string.

#include <stdio.h>
#include <stddef.h>
#include <stdint.h>
#include <stdlib.h>

static inline size_t detalign(size_t size)
{
    if (size == 0)     return 0;
    if (!(size & 0x7)) return 8;
    if (!(size & 0x3)) return 4;
    if (!(size & 0x1)) return 2;
    else               return 1;
}

size_t strlen(const char *str)
{
    size_t k = 0;
    while (str[k] != '\0')
        k++;
    return k;
}

void *memcpy(void *restrict dest, const void *restrict src, size_t count) {
    if (count < 2048) {
        int sizes[4] = { 8, 4, 2, 1 };
        int pos = 0, rem = count;
        while (rem > 0) {
            // find the greatest power-of-two divisor which fits the remaining source count
            int divisor = 0;
            for (int i = 0; i < 4; i++) {
                if (rem / sizes[i] && sizes[i] > divisor)
                    divisor = sizes[i];
            }

            // switch over the sizes, because each size must be handled with its own type.
            int iter = 0;
            int reps = rem / divisor;
            switch (divisor) {
            case 8:
                uint64_t *restrict qdest = (uint64_t *) (dest + pos), *restrict qsrc = (uint64_t *) (src + pos);
                for (int j = 0; j < reps; j++)
                    qdest[j] = qsrc[j];
                printf("Iteration %d: start position %d, end position %d, size %d.\n", iter, pos, pos + (reps * divisor), divisor);
                break;
            case 4:
                uint32_t *restrict ddest = (uint32_t *) (dest + pos), *restrict dsrc = (uint32_t *) (src + pos);
                for (int j = 0; j < reps; j++)
                    ddest[j] = dsrc[j];
                printf("Iteration %d: start position %d, end position %d, size %d.\n", iter, pos, pos + (reps * divisor), divisor);
                break;
            case 2:
                uint16_t *restrict hdest = (uint16_t *) (dest + pos), *restrict hsrc = (uint16_t *) (src + pos);
                for (int j = 0; j < reps; j++)
                    hdest[j] = hsrc[j];
                printf("Iteration %d: start position %d, end position %d, size %d.\n", iter, pos, pos + (reps * divisor), divisor);
                break;
            case 1:
                uint8_t *restrict bdest = (uint8_t *) (dest + pos), *restrict bsrc = (uint8_t *) (src + pos);
                for (int j = 0; j < reps; j++)
                    bdest[j] = bsrc[j];
                printf("Iteration %d: start position %d, end position %d, size %d.\n", iter, pos, pos + (reps * divisor), divisor);
                break;
            }
            pos += reps * divisor;
            rem -= reps * divisor;
            iter++;
        }
    }
}

int main() {
    char *string1 = "This is an example string. Please note I have not precomputed the length.";
    size_t length = strlen((const char *) string1);
    
    char *string2 = (char *) malloc(length * sizeof(char));
    memcpy(string2, string1, length);
    printf("\n");
    printf("STRING 1 (%d bytes): \"%s\"\n", length * sizeof(char), string1);
    printf("STRING 2 (%d bytes): \"%s\"\n", strlen(string2) * sizeof(char), string2);
}

As one can see, the initial test cases work:

Iteration 0: start position 0, end position 72, size 8.
Iteration 0: start position 72, end position 73, size 1.

STRING 1 (73 bytes): "This is an example string. Please note I have not precomputed the length."
STRING 2 (73 bytes): "This is an example string. Please note I have not precomputed the length."

and

Iteration 0: start position 0, end position 24, size 8.
Iteration 0: start position 24, end position 28, size 4.
Iteration 0: start position 28, end position 29, size 1.

STRING 1 (29 bytes): "BLAH BLAH BLAH BLAH BLAH BLAH"
STRING 2 (29 bytes): "BLAH BLAH BLAH BLAH BLAH BLAH"

However, delete the period off the end of the first test case (length = 72) and now the copied string possesses 74 bytes rather than 72 in the input string. Similarly, the second test case displays the same issue: delete the last "BLAH" and now the 24-byte string results in a 26-byte copy.

How do I go about fixing this? I attempted to place iteration counts in the inside of the while() structure to see if it would point out a failure in one of the conditions, but couldn't find anything readily sticking out.

5
  • your string2 buffer is not \0 terminated, so it's no "string"... Commented Oct 6, 2023 at 17:43
  • Is your void* return type declaration for memcpy a typo for void? Otherwise, that function should return something. And you should tell us what compiler you are using: Standard C doesn't allow arithmetic on void* pointers (like in dest + pos) but (IIRC) GCC does. Commented Oct 6, 2023 at 17:47
  • @AdrianMole That's a typo from the (frequent) modifications I've been doing to this function - most have a return dest at function-level scope at the end. I'm using GCC (13.2). Commented Oct 6, 2023 at 18:24
  • OT: Your iter does not count, because you reset it in each iteration. Commented Oct 6, 2023 at 20:01
  • to make my first comment more explicit: strlen(string2) will count something arbitrary as string2 is not null terminated Commented Oct 6, 2023 at 20:52

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.