What does `+&` mean in gcc inline assembly?

Question

I'm aware that when using gcc inline assembly, if you don't specify otherwise, it assumes that you consume all your inputs before you write any ouput operand. If you actually want to write to an output operand before consuming all inputs, you must specify it as early-clobber so it doesn't reuse that register for an input.

My question arose when I saw this example from the authoritative reference:

void
dscal (size_t n, double *x, double alpha)
{
  asm ("/* lots of asm here */"
       : "+m" (*(double (*)[n]) x), "+&r" (n), "+b" (x) // <-- There's the "+&r" (n)
       : "d" (alpha), "b" (32), "b" (48), "b" (64),
         "b" (80), "b" (96), "b" (112)
       : "cr0",
         "vs32","vs33","vs34","vs35","vs36","vs37","vs38","vs39",
         "vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47");
}

What? Why does it earlyclobber an ouput-input register? Isn't it the same register anyway?

There is no explanation of the matter in that page.

Digging further I found this, which states:

An operand which is read by the instruction can be tied to an earlyclobber operand if its only use as an input occurs before the early result is written. Adding alternatives of this form often allows GCC to produce better code when only some of the read operands can be affected by the earlyclobber. See, for example, the ‘mulsi3’ insn of the ARM.

Furthermore, if the earlyclobber operand is also a read/write operand, then that operand is written only after it’s used.

That last one speaks about the +&r case but I honestly don't get what it says. I don't know what "used" means.

Doing a quick grep -r '+&' on the linux kernel yielded very few results, and only one file where it is used in x86 architecture (which is what I'm somewhat familiar with (not too much)): (file arch/x86/crypto/curve25519-x86_64.c)

/* Computes the addition of four-element f1 with value in f2
 * and returns the carry (if any) */
static inline u64 add_scalar(u64 *out, const u64 *f1, u64 f2)
{
    u64 carry_r;

    asm volatile(
        /* Clear registers to propagate the carry bit */
        "  xor %%r8d, %%r8d;"
        "  xor %%r9d, %%r9d;"
        "  xor %%r10d, %%r10d;"
        "  xor %%r11d, %%r11d;"
        "  xor %k1, %k1;"

        /* Begin addition chain */
        "  addq 0(%3), %0;"
        "  movq %0, 0(%2);"
        "  adcxq 8(%3), %%r8;"
        "  movq %%r8, 8(%2);"
        "  adcxq 16(%3), %%r9;"
        "  movq %%r9, 16(%2);"
        "  adcxq 24(%3), %%r10;"
        "  movq %%r10, 24(%2);"

        /* Return the carry bit in a register */
        "  adcx %%r11, %1;"
        : "+&r"(f2), "=&r"(carry_r)
        : "r"(out), "r"(f1)
        : "%r8", "%r9", "%r10", "%r11", "memory", "cc");

    return carry_r;
}

I really don't get why using +r wouldn't be enough.

What if, on entry to the asm, both f2 and f1 are known by the compiler to contain the same value? Can it use the same register for both? That might work (thus saving a register) if f1 is only used before f2 gets written. But if that can't be guaranteed, earlyclobber ensures they use separate registers. — David Wohlferd
– David Wohlferd, Commented May 30, 2024 at 2:19
@DavidWohlferd That's it! I appreciate your time. I wrote some contrived examples to force such a situation and using +& made a difference. These sort of details seem fuzzy and not well known by many. For anyone interested I found this thread concerning my question. By the way, why not make that an answer? It answered my question perfectly! — ChristmasTree
– ChristmasTree, Commented May 30, 2024 at 19:10

David Wohlferd · Accepted Answer · 2024-05-30 20:04:02Z

2

Since my comment turned out to be useful, I'm proposing it as an answer:

What if, on entry to the asm, both f2 and f1 are known by the compiler to contain the same value? Can it use the same register for both? That might work (thus saving a register) if f1 is only used before f2 gets written. But if that can't be guaranteed, earlyclobber ensures they use separate registers.

There's a (performance) incentive for the compiler to minimize register usage when invoking asm. The more registers it uses, the more registers have to be spilled/restored.

I'll also add that as a general rule, you should avoid using inline asm. While it's cool and powerful and interesting, it's really hard to get right and painful to support.

answered May 30, 2024 at 20:04

David Wohlferd

7,6102 gold badges32 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

ChristmasTree Over a year ago

It's definitely very cool 😎. Also, do you think that for OS development one should use inline asm or just use separate files?

Nate Eldredge Over a year ago

Here is an example if you'd like to add it to your answer: godbolt.org/z/e9Goe8Eve

David Wohlferd Over a year ago

OS development is tricky. You're dealing with extreme timing issues down to ticks and instructions that no other type of C program is ever going to ever use. While I'd recommend minimizing inline asm, avoiding it altogether may not be practical.

Collectives™ on Stack Overflow

What does `+&` mean in gcc inline assembly?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related