Using C arrays in inline GCC assembly

Question

I'd like to use two array passed into a C function as below in assembly using a GCC compiler (Xcode on Mac). It has been many years since I've written assembly, so I'm sure this is an easy fix.

The first line here is fine. The second line fails. I'm trying to do the following, A[0] += x[0]*x[0], and I want to do this for many elements in the array with different indices. I'm only showing one here. How do I use a read/write array in the assembly block?

And if there is a better approach to do this, I'm open ears.

inline void ArrayOperation(float A[36], const float x[8])
{
    float tmp;

    __asm__ ( "fld %1; fld %2; fmul; fstp %0;" : "=r" (tmp) : "r" (x[0]), "r" (x[0]) );
    __asm__ ( "fld %1; fld %2; fadd; fstp %0;" : "=r" (A[0]) : "r" (A[0]), "r" (tmp) );

    // ...
}

The compiler says that this line invalid. My guess is that A[0] can not be addressed this way for output? — paul
– paul, Commented May 17, 2011 at 16:22

Norbert P. · Accepted Answer · 2011-05-17 17:31:19Z

2

The reason why the code fails is not because of arrays, but because of the way fld and fst instructions work. This is the code you want:

float tmp;

__asm__ ( "flds %1; fld %%st(0); fmulp; " : "=t" (tmp) : "m" (x[0]) );
__asm__ ( "flds %1; fadds %2;" : "=t" (A[0]) : "m" (A[0]), "m" (tmp) );

fld and fst instructions need a memory operand. Also, you need to specify if you want to load float (flds), double (fldl) or long double (fldt). As for the output operands, I just use a constraint =t, which simply tells the compiler that the result is on the top of the register stack, i.e. ST(0).

Arithmetic operations have either no operands (fmulp), or a single memory operand (but then you have to specify the size again, fmuls, fadds etc.).

You can read more about inline assembler, GNU Assembler in general, and see the Intel® 64 and IA-32 Architectures Software Developer’s Manual.

Of course, it is best to get rid of the temporary variable:

   __asm__ ( "flds %1; fld %%st(0); fmulp; fadds %2;" : "=t" (A[0]) : "m" (x[0]), "m" (A[0]));

Though if a performance improvement is what you're after, you don't need to use assembler. GCC is completely capable of producing this code. But you might consider using vector SSE instructions and other simple optimization techniques, such as breaking the dependency chains in the calculations, see Agner Fog's optimization manuals

answered May 17, 2011 at 17:31

Norbert P.

2,8271 gold badge20 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

paul Over a year ago

Thanks. Yeah, I'm trying to see how to improve the speed better than GCC so I thought I'd try learning some ARM assembly.

Collectives™ on Stack Overflow

Using C arrays in inline GCC assembly

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related