I claim there's no UB here, neither per the language standard (the standard doesn't cover this function/intrinsic) nor per the implementation, and there's a simple rollover.
Here's my reasoning...
InterlockedIncrement() is conceptually very simple and if it had a special case, it would be very hard to miss it and fail to document it. And the documentation hasn't mentioned any special case here for some 15+ years.
How would you implement it anyway?
If you're on the 80486 or better, the most natural implementation uses the XADD instruction with the LOCK prefix that atomically adds a value to a memory variable. The instruction by itself does not generate any overflow exception, however it does modify the EFLAGS register as does the regular addition instruction, ADD, so it's possible to detect an overflow and act on it. Specifically, you could throw in the INTO instruction to turn the overflow condition into an exception. Or you could use the conditional jump instruction, JO, to jump the overflow handler.
If you only have 80386 features, you have lock inc and lock add which can implement InterlockedIncrement only if the return value isn't used except to test it for zero/non-zero or negative/non-negative, or other condition you can derive from EFLAGS such as that signed-overflow happened (EFLAGS.OF). (In the general case where code does int tmp = InterlockedIncrement(&shared);, you'd need either lock xadd or a lock cmpxchg retry loop but those are both 486 or Pentium features. So you'd need a separate lock variable which all writers acquire and release, perhaps using the XCHG instruction (the LOCK is implicit with it) to take the lock. But this isn't compatible with the Interlocked... API which just exposes lock-free hardware operations, unlike std::atomic<int> which could have lock_free() be false when compiling for 386 and use a separate spinlock like it does for wide types. (In shared memory between processes, the spinlock would also need to be in shared memory so both processes could respect the same spinlock, so a library API can't just invent one for you, and it would force even operations like exchange to take/release the lock.)
Now, would you want to throw INTO or JO into all instances of InterlockedIncrement()? Probably not, definitely not by default. People like their atomic operations small and fast. And in some algorithms, there'd be no way to prevent the possibility of overflow with huge numbers of threads.
This is the "immediate" UB. What about the "creeping" UB?
If you had C code like this:
int a = INT_MAX;
if (a + 1 < a)
puts("Overflow!");
You'd likely get nothing printed nowadays.
Modern compilers know that a + 1 can't legally(!) overflow and so the condition in the if statement can be taken as false irrespective of the value of a.
Can you have a similar optimization with InterlockedIncrement()?
Well, given that the variable is volatile and can indeed change in a different thread at any moment, the compiler may not assume unchanged a from two memory reads of it (you'd likely write a + 1 < a or similar as multiple statements and each a would need to be fetched if it's volatile).
It would also be an odd context to try to make the optimization in.
InterlockedIncrementstill has to return aLONG. What value will it return in this case?LONG_MIN, right? Just like .NET's Interlocked.Increment() does.