Creating a unsigned int with undefined overflow?

Question

In a recent CppCon talk done by Chandler Carruth (link) around 39:16, he explain how letting the overflow of a signed integer undefined allow compilers to generate optimized assembly. These kind of optimization can also be found in this blog post done by Krister Walfridsson, here

Being previously bitten by bugs involving sizes overflowing on INT_MAX, I tend to be pedantic on types I use in my code, but at the same time I don't want to lose quite straightforward performance gains.

While I have a limited knowledge of assembly, this left me wondering what would it entail to create a unsigned integer with undefined overflow ? This seems to be a recurring issue but I didn't find any proposal to introduce one (and eventually update std::size_t), does something like this has ever been discussed ?

I'm not sure if there are any real possibilities for performance optimizations of undefined unsigned overflow. — Bartek Banachewicz
– Bartek Banachewicz, Commented Jun 9, 2017 at 13:33
I did some test on typical tight loop on my vs2017 setup, and I did see performance improvement. Also, I've found this blog post explaining better what these optimization rely on: kristerw.blogspot.fr/2016/02/… — Errata
– Errata, Commented Jun 9, 2017 at 13:42
I don't understand the question, and I don't have time to watch the video (actually I already did watch it few months ago, but I can't recall which part you are referring to). May you add some example, what you have on mind? Generally on platforms as x86_32/64 you have simply 32 or 64 bits for integer numbers, when you overflow them by some operation, the CPU does usually set the "carry flag" (or other in combination), so you can test for that. But that would hurt performance, usually it's better to fully understand your data and formulas, and validate data just once ahead of calculation. — Ped7g
– Ped7g, Commented Jun 9, 2017 at 13:44
Then again, the C++ standard is somehow defined, and it doesn't map 1:1 to the underlaying CPU, so having this universally available is another burden, on 32 bit platforms there's usually the long long to emulate 64b integers, which depending on the CPU features may be as fast as 2x32b, or considerably slower. But generally the int overflow is not a problem in pure assembly, you either ignore it, or handle it, as you wish. (when I talk about integers, I mean both signed and unsigned, on x86-like platforms there's no real difference in assembly, only mul/div has specialized signed variants) — Ped7g
– Ped7g, Commented Jun 9, 2017 at 13:47
@Ped7g: OP want to define a (primitive) unsigned type for which compiler is allowed to assume no overflows, so allowing this kind of optimization: x + 42 < y + 84 <=> x < y + 42. — Jarod42
– Jarod42, Commented Jun 9, 2017 at 14:04

Peter · Accepted Answer · 2017-06-10 04:56:53Z

This question is completely backward. There isn't some magic panacea by which a behaviour can be deemed undefined in order to give a compiler optimisation opportunities. There is always an offset.

For an operation to have undefined behaviour in some conditions, the C++ standard would need to describe no constraints on the resultant behaviour. That is the definition of undefined behaviour in the standard.

For the C++ standard (or any standard - undefined behaviour is a feature of standards, not just standards for programming languages) to do that, there would need to be more than one realistic way of implementing the operation, under a range of conditions, that produces different outcomes, advantages, and disadvantages. There would also need to be a realistic prospect of real-world implementation for more than one alternative. Lastly, there needs to be realistic chances that each of those features provide some value (e.g. desirable attributes of a system which uses those features, etc) - otherwise one approach can be specified, and there is no need for alternatives.

Overflow of signed integers has undefined behaviour on overflow because of a number of contributing factors. Firstly, there are different representations of a signed integer (e.g. ones-complement, twos-complement, etc). Second, the representation of a signed integer (by definition) includes representation of a sign e.g. a sign bit. Third, there is no particular representation of a signed integer that is inherently superior to another (choosing one or the others involves engineering trade-offs, for example in design of circuitry within a processor to implement an addition operation). Fourth, there are real-world implementations that use different representations. Because of these factors, operations on a signed integer that overflow may "wrap" with one CPU, but result in a hardware signal that must be cleared on another CPU. Each of these types of behaviour - or others - may be "optimal", by some measure, for some applications than not others. The standard has to allow for all of these possibilities - and the means by which it does that is to deem the behaviour undefined.

The reason arithmetic on unsigned integers has well-defined behaviour is because there aren't as many realistic ways of representing them or operations on them and - when such representations and operations on them are implemented in CPU circuitry, the results all come out the same (i.e. modulo arithmetic). There is no "sign bit" to worry about in creating circuits to represent and operate on unsigned integral values. Even if bits in an unsigned variable are physically laid out differently the implementation of operations (e.g. an adder circuit using NAND gates) causes a consistent behaviour on overflow for all basic math operations (addition, subtraction, multiplication, division). And, not surprisingly, all existing CPUs do it this way. There isn't one CPU that generates a hardware fault on unsigned overflow.

So, if you want overflow operations on unsigned values to have undefined behaviour, you would first need to find a way of representing an unsigned value in some way that results in more than one feasible/useful result/behaviour, make a case that your scheme is better in some way (e.g. performance, more easily fabricated CPU circuitry, application performance, etc). You would then need to convince some CPU designers to implement such a scheme, and convince system designers that scheme gives a real-world advantage. At the same time, you would need to leave some CPU designers and system designers with the belief that some other scheme has an advantage over yours for their purposes. In other words, real world usage of your approach must involve real trade-offs, rather than your approach having consistent advantage over all possible alternatives. Then, once you have multiple realisations in hardware of different approaches - which result in different behaviours on overflow on different platforms - you would need to convince the C++ standardisation committee that there is advantage in supporting your scheme in the standard (i.e. language or library features that exploit your approach), and that all of the possible behaviours on overflow need to be permitted. Only then will overflow on unsigned integers (or your variation thereof) have undefined behaviour.

I've described the above in terms of starting at the hardware level (i.e. having native support for your unsigned integer type in hardware). The same goes if you do it in software, but you would need to convince developers of libraries or operating systems instead.

Only then will you have introduced an unsigned integral type which has undefined behaviour if operations overflow.

More generally, as said at the start, this question is backward though. It is true that compilers exploit undefined behaviour (sometimes in highly devious ways) to improve performance. But, for the standard to deem that something has undefined behaviour, there needs to be more than one way of doing relevant operations, and implementations (compilers, etc) need to be able to analyse benefits and trade-offs of the alternatives, and then - according to some criteria - pick one. Which means there will always be a benefit (e.g. performance) and an unwanted consequence (e.g. unexpected results in some edge cases).

Lightness Races in Orbit · Accepted Answer · 2017-06-09 13:32:22Z

3

There is no such thing as an unsigned integer with undefined overflow. C++ is very specific that unsigned types do not overflow; they obey modulo arithmetic.

Could a future version of the language add an arithmetic type that does not obey modulo arithmetic, but also does not support signedness (and thus may use the whole range of its bits)? Maybe. But the alleged performance gains are not what they are with a signed value (which would otherwise have to consider correct handling of the sign bit, whereas an unsigned value has no "special" bits mandated) so I wouldn't hold your breath. In fact, although I'm no assembly expert, I can't imagine that this would be useful in any way.

answered Jun 9, 2017 at 13:32

Lightness Races in Orbit

387k77 gold badges672 silver badges1.1k bronze badges

2 Comments

Margaret Bloom Over a year ago

Could saturated arithmetic be an example of special unsigned integer "overflow"? It's easy to stumble on definitions: saturated unsigned types do not overflow either but unlike normal unsigned types they don't obey modulo arithmetic - if C++ was going to adopt them, what would be undefined? The overflow or the arithmetic?

Peter Over a year ago

The basic rule of saturated arithmetic generally amounts to "if overflow occurs, use the maximum and if underflow occurs use the minimum". No room for something undefined in that.

Collectives™ on Stack Overflow

Creating a unsigned int with undefined overflow?

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related