Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

Question

Here is a C function that adds an int to another, failing if overflow would happen:

int safe_add(int *value, int delta) {
        if (*value >= 0) {
                if (delta > INT_MAX - *value) {
                        return -1;
                }
        } else {
                if (delta < INT_MIN - *value) {
                        return -1;
                }
        }

        *value += delta;
        return 0;
}

Unfortunately it is not optimized well by GCC or Clang:

safe_add(int*, int):
        movl    (%rdi), %eax
        testl   %eax, %eax
        js      .L2
        movl    $2147483647, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jl      .L6
.L4:
        addl    %esi, %eax
        movl    %eax, (%rdi)
        xorl    %eax, %eax
        ret
.L2:
        movl    $-2147483648, %edx
        subl    %eax, %edx
        cmpl    %esi, %edx
        jle     .L4
.L6:
        movl    $-1, %eax
        ret

This version with __builtin_add_overflow()

int safe_add(int *value, int delta) {
        int result;
        if (__builtin_add_overflow(*value, delta, &result)) {
                return -1;
        } else {
                *value = result;
                return 0;
        }
}

is optimized better:

safe_add(int*, int):
        xorl    %eax, %eax
        addl    (%rdi), %esi
        seto    %al
        jo      .L5
        movl    %esi, (%rdi)
        ret
.L5:
        movl    $-1, %eax
        ret

but I'm curious if there's a way without using builtins that will get pattern-matched by GCC or Clang.

I see there is gcc.gnu.org/bugzilla/show_bug.cgi?id=48580 in the context of multiplication. But addition should be much easier to pattern-match. I'll report it. — Tavian Barnes
– Tavian Barnes, Commented Feb 14, 2020 at 15:47

Jens Gustedt · Accepted Answer · 2020-02-14 16:34:22Z

5

Tthe best one I came up with, if you don't have access to the overflow flag of the architecture, is to do things in unsigned. Just think of all bit arithmetic here in that we are only interested in the highest bit, which is the sign bit when interpreted as signed values.

(All that modulo sign errors, I didn't check this thouroughly, but I hope the idea is clear)

#include <stdbool.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;

  if (!ret) a[0] += b;
  return ret;
}

If you find a version of the addition that is free of UB, such as an atomic one, the assembler is even without branch (but with a lock prefix)

#include <stdbool.h>
#include <stdatomic.h>
bool overadd(_Atomic(int) a[static 1], int b) {
  unsigned A = a[0];
  atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  bool ret = (AeB & AuAB) > M;
  return ret;
}

So if we had such an operation, but even more "relaxed" this could improve the situation even further.

Take3: If we use a special "cast" from the unsigned result to the signed one, this now is branch free:

#include <stdbool.h>
#include <stdatomic.h>

bool overadd(int a[static 1], int b) {
  unsigned A = a[0];
  //atomic_fetch_add_explicit(a, b, memory_order_relaxed);
  unsigned B = b;
  // This computation will be done anyhow
  unsigned AB = A + B;
  // See if the sign bits are equal
  unsigned AeB = ~(A^B);
  unsigned AuAB = (A^AB);
  // The function result according to these should be:
  //
  // AeB \ AuAB | false | true
  //------------+-------+------
  // false      | false | false
  // true       | false | true
  //
  // So the expression to compute from the sign bits is (AeB & AuAB)

  // This is INT_MAX
  unsigned M = -1U/2;
  unsigned res = (AeB & AuAB);
  signed N = M-1;
  N = -N - 1;
  a[0] =  ((AB > M) ? -(int)(-AB) : ((AB != M) ? (int)AB : N));
  return res > M;
}

edited Feb 14, 2020 at 16:34

answered Feb 14, 2020 at 8:40

Jens Gustedt

79.4k9 gold badges110 silver badges184 bronze badges

Sign up to request clarification or add additional context in comments.

15 Comments

Bob__ Over a year ago

Not the DV, but I believe that the second XOR shouldn't be negated. See e.g. this attempt to test all the proposals.

R.. GitHub STOP HELPING ICE Over a year ago

I tried something like this but couldn't get it to work. Looks promising but I wish GCC optimized the idiomatic code.

Jens Gustedt Over a year ago

@PSkocik, no this does not depend on the sign representation, the computation is entirely done as unsigned. But it depends on the fact that the unsigned type has not just the sign bit masked out. (Both are now guaranteed in C2x, that is, hold for all archs that we could find). Then, you can't cast the unsigned result back if it is greater than INT_MAX, that would be implementation defined and may raise a signal.

Jens Gustedt Over a year ago

@PSkocik, no unfortunately not, that seemed to revolutianary to the committee. But here is a "Take3" that actual comes out without branches on my machine.

Bob__ Over a year ago

Sorry to bother you again, but I think you should change Take3 into something like this to obtain correct results. It seems promising, though.

|

Alexander Cherepanov · Accepted Answer · 2020-02-18 23:09:10Z

The situation with signed operations is much worse than with unsigned ones, and I see only one pattern for signed addition, only for clang and only when a wider type is available:

int safe_add(int *value, int delta)
{
    long long result = (long long)*value + delta;

    if (result > INT_MAX || result < INT_MIN) {
        return -1;
    } else {
        *value = result;
        return 0;
    }
}

clang gives exactly the same asm as with __builtin_add_overflow:

safe_add:                               # @safe_add
        addl    (%rdi), %esi
        movl    $-1, %eax
        jo      .LBB1_2
        movl    %esi, (%rdi)
        xorl    %eax, %eax
.LBB1_2:
        retq

Otherwise, the simplest solution I can think of is this (with the interface as Jens used):

_Bool overadd(int a[static 1], int b)
{
    // compute the unsigned sum
    unsigned u = (unsigned)a[0] + b;

    // convert it to signed
    int sum = u <= -1u / 2 ? (int)u : -1 - (int)(-1 - u);

    // see if it overflowed or not
    _Bool overflowed = (b > 0) != (sum > a[0]);

    // return the results
    a[0] = sum;
    return overflowed;
}

gcc and clang generate very similar asm. gcc gives this:

overadd:
        movl    (%rdi), %ecx
        testl   %esi, %esi
        setg    %al
        leal    (%rcx,%rsi), %edx
        cmpl    %edx, %ecx
        movl    %edx, (%rdi)
        setl    %dl
        xorl    %edx, %eax
        ret

We want to compute the sum in unsigned, so unsigned have to be able to represent all values of int without any of them sticking together. To easily convert the result from unsigned to int, the opposite is useful too. Overall, two's complement is assumed.

On all popular platforms I think we can convert from unsigned to int by a simple assignment like int sum = u; but, as Jens mentioned, even the latest variant of the C2x standard allows it to raise a signal. The next most natural way is to do something like that: *(unsigned *)&sum = u; but non-trap variants of padding apparently could differ for signed and unsigned types. So the example above goes the hard way. Fortunately, both gcc and clang optimize this tricky conversion away.

P.S. The two variants above could not be compared directly as they have different behavior. The first one follows the original question and doesn't clobber the *value in case of overflow. The second one follows the answer from Jens and always clobbers the variable pointed to by the first parameter but it's branchless.

Replaced equality by xor in the overflow check to get better asm with gcc. Added asm.

Iłya Bursov · Accepted Answer · 2020-02-13 17:55:29Z

1

the best version I can come up with is:

int safe_add(int *value, int delta) {
    long long t = *value + (long long)delta;
    if (t != ((int)t))
        return -1;
    *value = (int) t;
    return 0;
}

which produces:

safe_add(int*, int):
    movslq  %esi, %rax
    movslq  (%rdi), %rsi
    addq    %rax, %rsi
    movslq  %esi, %rax
    cmpq    %rsi, %rax
    jne     .L3
    movl    %eax, (%rdi)
    xorl    %eax, %eax
    ret
.L3:
    movl    $-1, %eax
    ret

answered Feb 13, 2020 at 17:55

Iłya Bursov

24.4k4 gold badges36 silver badges68 bronze badges

7 Comments

Tavian Barnes Over a year ago

I'm surprised even that doesn't use the overflow flag. Still much better than the explicit range checks, but it doesn't generalize to adding long longs.

Iłya Bursov Over a year ago

@TavianBarnes you're right, unfortunately there is no good way to use overflow flags in c (except compiler specific builtins)

emacs drives me nuts Over a year ago

This code suffers from signed overflow, which is Undefined Behaviour.

Jens Gustedt Over a year ago

@emacsdrivesmenuts, you are right, the cast in the comparisson can overflow.

Tavian Barnes Over a year ago

@emacsdrivesmenuts The cast is not undefined. When out of the range of int, a cast from a wider type will either produce an implementation-defined value or raise a signal. All implementations I care about define it to preserve the bit pattern which does the right thing.

|

Konrad Rudolph · Accepted Answer · 2020-02-14 15:36:20Z

0

I could get the compiler to use the sign flag by assuming (and asserting) a two’s complement representation without padding bytes. Such implementations should yield the required behaviour in the line annotated by a comment, although I can’t find a positive formal confirmation of this requirement in the standard (and there probably isn’t any).

Note that the following code only handles positive integer addition, but can be extended.

int safe_add(int* lhs, int rhs) {
    _Static_assert(-1 == ~0, "integers are not two's complement");
    _Static_assert(
        1u << (sizeof(int) * CHAR_BIT - 1) == (unsigned) INT_MIN,
        "integers have padding bytes"
    );
    unsigned value = *lhs;
    value += rhs;
    if ((int) value < 0) return -1; // impl. def., 6.3.1.3/3
    *lhs = value;
    return 0;
}

This yields on both clang and GCC:

safe_add:
        add     esi, DWORD PTR [rdi]
        js      .L3
        mov     DWORD PTR [rdi], esi
        xor     eax, eax
        ret
.L3:
        mov     eax, -1
        ret

edited Feb 14, 2020 at 15:36

answered Feb 14, 2020 at 14:18

Konrad Rudolph

549k142 gold badges968 silver badges1.3k bronze badges

4 Comments

Jens Gustedt Over a year ago

I think that the cast in the comparision is undefined. But you could get away with this as I do in my answer. But then also, all the fun is in being able to cover all the cases. Your _Static_assert serves not much of a purpose, because this is trivially true on any current architecture, and will even be imposed for C2x.

Konrad Rudolph Over a year ago

@Jens Actually it seems the cast is implementation-defined, not undefined, if I am reading (ISO/IEC 9899:2011) 6.3.1.3/3 correctly. Can you double-check that? (However, extending this to negative arguments makes the whole thing rather convoluted and ultimately similar to your solution.)

Jens Gustedt Over a year ago

You are right, it is iplementation defined, but may also raise a signal :(

Konrad Rudolph Over a year ago

@Jens Yeah, technically I guess a two’s complement implementation might still contain padding bytes. Maybe the code should test for this by comparing the theoretical range to INT_MAX. I’ll edit the post. But then again I don’t think this code should be used in practice anyway.

Collectives™ on Stack Overflow

Is there a C snippet that computes overflow-safe addition efficiently without using compiler builtins?

4 Answers 4

15 Comments

1 Comment

7 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

15 Comments

1 Comment

7 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related