Using GCC's builtin functions in arm

Question

I'm working on a cortex-m3 board with a bare-metal toolchain without libc.

I implemented memcpy which copies data byte-to-byte but it's too slow. In GCC manual, it says it provides __builtin_memcpy and I decided to use it. So here is the implementation with __builtin_memcpy.

#include <stddef.h>

void *memcpy(void *dest, const void *src, size_t n)
{
    return __builtin_memcpy(dest,src,n);
}

When I compile this code, it becomes a recursive function which never ends.

$ arm-none-eabi-gcc -march=armv7-m -mcpu=cortex-m3 -mtune=cortex-m3 \
  -O2 -ffreestanding -c memcpy.c -o memcpy.o
$ arm-none-eabi-objdump -d memcpy.o

memcpy.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <memcpy>:
   0:   f7ff bffe       b.w     0 <memcpy>

Am I doing wrong? How can I use the compiler-generated memcpy version?

chill · Accepted Answer · 2011-12-14 11:06:41Z

Builtin functions are not supposed to be used to implement itself :)

Builtin functions are supposed to be used in application code - then the compiler may or may not generate some special insn sequence or a call to the underlying real function

Compare:

int a [10], b [20];

void
foo ()
{
  __builtin_memcpy (a, b, 10 * sizeof (int));
}

This results in:

foo:
    stmfd   sp!, {r4, r5}
    ldr     r4, .L2
    ldr     r5, .L2+4
    ldmia   r4!, {r0, r1, r2, r3}
    mov     ip, r5
    stmia   ip!, {r0, r1, r2, r3}
    ldmia   r4!, {r0, r1, r2, r3}
    stmia   ip!, {r0, r1, r2, r3}
    ldmia   r4, {r0, r1}
    stmia   ip, {r0, r1}
    ldmfd   sp!, {r4, r5}
    bx      lr

But:

void
bar (int n)
{
  __builtin_memcpy (a, b, n * sizeof (int));
}

results in a call to the memcpy function:

bar:
    mov     r2, r0, asl #2
    stmfd   sp!, {r3, lr}
    ldr     r1, .L5
    ldr     r0, .L5+4
    bl      memcpy
    ldmfd   sp!, {r3, lr}
    bx      lr

sergiy · Accepted Answer · 2014-08-06 22:32:02Z

1

Theoretically, library is not part of C compiler and not part of toolchain. Thus, if you wrotememcpy(&a,&b,sizeof(a)) compiler MUST generate subroutine call.

The idea of __builtin : to inform compiler, that the function is standard and can be optimized. Thus, if you wrote __builtin_memcpy(&a,&b,sizeof(a)) compiler MAY generate subroutine call, but in most cases it will not happens. For example, if size is known as 4 at compile time - only one mov command will be generated. (Another advantage - even in case of subroutine call compiler is informed, that library function has no side effects).

So, it's ALWAYS better to use __builtin_memcpy instead of memcpy. In modern libraries it was done by #define memcpy __builtin_memcpy just in string.h

But you still need implement memcpy somewhere, call will be generated in sophistical places. For string functions on ARM, it's strictly recommended 4-byte implementation.

answered Aug 6, 2014 at 22:32

sergiy

112 bronze badges

1 Comment

Mic1780 Over a year ago

This question was asked two years ago and already has an answer. Please try not to bring back these types of questions.

Collectives™ on Stack Overflow

Using GCC's builtin functions in arm

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related