0

Is the very simple code below susceptible to undefined behaviour as the integer overflows as a result of the operation?

static volatile LONG x = LONG_MAX;

InterlockedIncrement(&x);

According to the standard, signed integer overflow is undefined behaviour. However, here we are out of the standard, as we are calling compiler's intrinsic function, which inlines to some assembly. Also, the value of x is not used anywhere (the function is just used as a memory barrier).

An answer to a similar question suggests this is not UB.

10
  • 2
    I'd argue it doesn't matter when or where the overflow happens. If it happens then it's UB from the C++ standard point of view. The behavior may be defined from the point of view of the compiler and the code it generates though, but from a strict C++-standard perspective it's still UB. Commented Mar 1, 2019 at 7:50
  • The way I would look at it is: InterlockedIncrement still has to return a LONG. What value will it return in this case? Commented Mar 1, 2019 at 7:57
  • @P.W It could wrap and consequently return LONG_MIN, right? Just like .NET's Interlocked.Increment() does. Commented Mar 1, 2019 at 8:01
  • @MatthäusBrandl The problem is that the "wrapping" is not defined by the C++ specification. Commented Mar 1, 2019 at 8:02
  • 3
    It is not terribly undefined behavior these days. They don't build a lot of processors anymore that use one's-complement representation or trap on signed overflow. The logic error it produces is certainly the bigger issue. You'd of course favor MemoryBarrier() if that is all you need. Commented Mar 1, 2019 at 8:03

1 Answer 1

2

I claim there's no UB here, neither per the language standard (the standard doesn't cover this function/intrinsic) nor per the implementation, and there's a simple rollover.

Here's my reasoning...

InterlockedIncrement() is conceptually very simple and if it had a special case, it would be very hard to miss it and fail to document it. And the documentation hasn't mentioned any special case here for some 15+ years.

How would you implement it anyway?

If you're on the 80486 or better, the most natural implementation uses the XADD instruction with the LOCK prefix that atomically adds a value to a memory variable. The instruction by itself does not generate any overflow exception, however it does modify the EFLAGS register as does the regular addition instruction, ADD, so it's possible to detect an overflow and act on it. Specifically, you could throw in the INTO instruction to turn the overflow condition into an exception. Or you could use the conditional jump instruction, JO, to jump the overflow handler.

If you only have 80386 features, you have lock inc and lock add which can implement InterlockedIncrement only if the return value isn't used except to test it for zero/non-zero or negative/non-negative, or other condition you can derive from EFLAGS such as that signed-overflow happened (EFLAGS.OF). (In the general case where code does int tmp = InterlockedIncrement(&shared);, you'd need either lock xadd or a lock cmpxchg retry loop but those are both 486 or Pentium features. So you'd need a separate lock variable which all writers acquire and release, perhaps using the XCHG instruction (the LOCK is implicit with it) to take the lock. But this isn't compatible with the Interlocked... API which just exposes lock-free hardware operations, unlike std::atomic<int> which could have lock_free() be false when compiling for 386 and use a separate spinlock like it does for wide types. (In shared memory between processes, the spinlock would also need to be in shared memory so both processes could respect the same spinlock, so a library API can't just invent one for you, and it would force even operations like exchange to take/release the lock.)


Now, would you want to throw INTO or JO into all instances of InterlockedIncrement()? Probably not, definitely not by default. People like their atomic operations small and fast. And in some algorithms, there'd be no way to prevent the possibility of overflow with huge numbers of threads.

This is the "immediate" UB. What about the "creeping" UB? If you had C code like this:

  int a = INT_MAX;
  if (a + 1 < a)
    puts("Overflow!");

You'd likely get nothing printed nowadays. Modern compilers know that a + 1 can't legally(!) overflow and so the condition in the if statement can be taken as false irrespective of the value of a.

Can you have a similar optimization with InterlockedIncrement()?

Well, given that the variable is volatile and can indeed change in a different thread at any moment, the compiler may not assume unchanged a from two memory reads of it (you'd likely write a + 1 < a or similar as multiple statements and each a would need to be fetched if it's volatile).

It would also be an odd context to try to make the optimization in.

Sign up to request clarification or add additional context in comments.

11 Comments

From what I can tell, the Standard would allow an implementation to do anything it likes in response to InterlockedIncrement(anything) unless user code defines a function by that name (in which case that function should be called). Implementations would be free to specify that such a call will behave in some useful fashion, but the Standard says nothing about what it would do.
According to posix.nl/linuxassembly/nasmdochtml/nasmdoca.html, the current opcode for cmpxchg was new with Pentium. NASM uses cmpxchg486 for the old 0F A7 opcode supported by some (but reportedly not all) 486 chips. Anyway, interesting point that you could implement a fairly safe atomic inc with an xchg (not cmpxchg) retry loop, if the only write access was other increment threads. You'd detect races (where cmpxchg would have failed) and calculate a new increment to attempt to add. (e.g. if you actually decreased the counter by 2, you then try to add 3 next time).
But I don't think there's a way to implement lock xadd with just xchg, only lock inc / lock add 1 for the right total count, without the right "old value". It doesn't preserve the sequence of operations (some threads get a bogus old value). Or another way to look at it, if you want the old value from the first xchg, instead of the successful xchg, is that other threads see effects of you mucking around with the counter multiple times, not one atomic modification. If there are pure readers as well as the writers, they can see non-monotonic changes to the value.
Anyway, probably also worth pointing out that C++ atomic<int> is defined by ISO C++11 to be 2's complement with overflow defined as wrap-around. Same for C11 _Atomic int. (except that in C++, ++a and a += 1 are defined by the standard (and maybe in some real headers) in terms of fetch_add(1) + 1, possibly making that part done with plain int. twitter.com/jfbastien/status/958026267420327936.). InterlockedIncrement is basically an intrinsic for lock inc/add or lock xadd, (hopefully the former if the result is unused), not subject to C signed integer rules.
@PeterCordes If you can use XCHG to implement a spinlock, you can effectively implement any atomic operation.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.