4

I'm working on a cortex-m3 board with a bare-metal toolchain without libc.

I implemented memcpy which copies data byte-to-byte but it's too slow. In GCC manual, it says it provides __builtin_memcpy and I decided to use it. So here is the implementation with __builtin_memcpy.

#include <stddef.h>

void *memcpy(void *dest, const void *src, size_t n)
{
    return __builtin_memcpy(dest,src,n);
}

When I compile this code, it becomes a recursive function which never ends.

$ arm-none-eabi-gcc -march=armv7-m -mcpu=cortex-m3 -mtune=cortex-m3 \
  -O2 -ffreestanding -c memcpy.c -o memcpy.o
$ arm-none-eabi-objdump -d memcpy.o

memcpy.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <memcpy>:
   0:   f7ff bffe       b.w     0 <memcpy>

Am I doing wrong? How can I use the compiler-generated memcpy version?

2 Answers 2

5

Builtin functions are not supposed to be used to implement itself :)

Builtin functions are supposed to be used in application code - then the compiler may or may not generate some special insn sequence or a call to the underlying real function

Compare:

int a [10], b [20];

void
foo ()
{
  __builtin_memcpy (a, b, 10 * sizeof (int));
}

This results in:

foo:
    stmfd   sp!, {r4, r5}
    ldr     r4, .L2
    ldr     r5, .L2+4
    ldmia   r4!, {r0, r1, r2, r3}
    mov     ip, r5
    stmia   ip!, {r0, r1, r2, r3}
    ldmia   r4!, {r0, r1, r2, r3}
    stmia   ip!, {r0, r1, r2, r3}
    ldmia   r4, {r0, r1}
    stmia   ip, {r0, r1}
    ldmfd   sp!, {r4, r5}
    bx      lr

But:

void
bar (int n)
{
  __builtin_memcpy (a, b, n * sizeof (int));
}

results in a call to the memcpy function:

bar:
    mov     r2, r0, asl #2
    stmfd   sp!, {r3, lr}
    ldr     r1, .L5
    ldr     r0, .L5+4
    bl      memcpy
    ldmfd   sp!, {r3, lr}
    bx      lr
Sign up to request clarification or add additional context in comments.

Comments

1

Theoretically, library is not part of C compiler and not part of toolchain. Thus, if you wrotememcpy(&a,&b,sizeof(a)) compiler MUST generate subroutine call.

The idea of __builtin : to inform compiler, that the function is standard and can be optimized. Thus, if you wrote __builtin_memcpy(&a,&b,sizeof(a)) compiler MAY generate subroutine call, but in most cases it will not happens. For example, if size is known as 4 at compile time - only one mov command will be generated. (Another advantage - even in case of subroutine call compiler is informed, that library function has no side effects).

So, it's ALWAYS better to use __builtin_memcpy instead of memcpy. In modern libraries it was done by #define memcpy __builtin_memcpy just in string.h

But you still need implement memcpy somewhere, call will be generated in sophistical places. For string functions on ARM, it's strictly recommended 4-byte implementation.

1 Comment

This question was asked two years ago and already has an answer. Please try not to bring back these types of questions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.