3

For example in C#, C++, Java or JavaScript effective int size is 32 bits. If we want to calculate some large number, for example, 70 bits, we should use some software features (Arbitrary-precision arithmetic).

Python has a very tricky integer internal boundless representation and I'm can not figure out what is the most efficient int size for integer arithmetic.

In other words, do we have some int size, say 64 bits, for effective ints usage?

Or it does not matter is it 16, 32, 64 or some random bits count, and Python will work for all these ints with the same efficiency?

In short does Python always uses Arbitrary-precision arithmetic or for 32\64 it uses hardware arithmetic?

9
  • You can use numpy, if size matters. Here you can choose from e.g. uint32 and uint64. Commented Aug 16, 2020 at 12:43
  • I'm not sure what you are asking, Python integers are not sized, which you seem to already understand? Commented Aug 16, 2020 at 12:43
  • @Stefan or alternatively, you can use the array module for primitive, numeric arrays which sometimes suffices (if you are going to do any heavy-duty processing, then probably numpy is what you want) Commented Aug 16, 2020 at 12:44
  • As far as you are concerned, the bit length of any particular int object is an implementation detail. I don't understand what you are asking here "do we have some int size, say 64 bits, for effective ints usage?" Commented Aug 16, 2020 at 12:46
  • @juanpa.arrivillaga we have two kinds of arithmetic. Hardware, for integers with size smaller or equal CPU's word. Software, when we can calculate large numbers (larger that CPU's word), but in a slow manner using some code. Does Python always use slow software arithmetic? Or does it use hardware arithmetic for 32/64 bits? Commented Aug 16, 2020 at 12:49

3 Answers 3

3

Short answer

Modern CPython is most efficient, depending on the operation, when performing operations with operands with a magnitude below 30 bits for most operations (a few specific operations may have a fast path for values up to 60 bits of magnitude). Everything else uses generalized arbitrary-precision math. On older Python versions, or unusual custom builds of Python, especially those built for 32 bit systems, the limit is 15 bits in general, and 60 bits for certain specific operations.

That said, there's no reason to try to do multiple smaller operations to replace fewer large operations in general; Python level code is slower than the C-level code that implements int, so you're almost always better off just doing the math "naturally" and not worrying about operand magnitudes.

Long answer

CPython's int, in Python 3, is represented as a sign-magnitude array of values, where each element in the array represents 15 or 30 bits of the magnitude. The default "limb" size differs by Python version and the architecture's pointer size:

  • Python 3.0 - Always 15 bit limbs, no other options
  • Python 3.1-3.10 - 30 bit limbs on 64 bit builds of Python (specifically, if sizeof(void*) >= 8), 15 bit limbs on 32 bit builds (can be tweaked with a configure switch at build time)
  • Python 3.11+ - 30 bit limbs on all builds by default (still tweakable at build time)

The long type in the Python 2 days followed the same rules (15 bits in 2.7.3 and earlier, changing to the 3.1-3.10 behavior in 2.7.4), but since there was a separate int type (defined in terms of a C long), the performance of long didn't matter as much.

All of this is an implementation detail (as you can see by it changing between minor releases on multiple occasions, and even between micro releases in the 2.7 timeframe), but a longstanding one (it was originally 15 all the time, but it was found to be an easy win to double the size and number of used bits per "digit" in the array when working on 64 bit systems). It has optimizations for ints that fit into a single (or sometimes two) such array values (it extracts the raw value from the array and performs a single CPU operation, skipping loops and algorithms that apply to the arbitrary length case), and on 64 bit builds of CPython, that currently means values with magnitude of 30 bits or less are generally optimized specially (with 60 bit magnitudes occasionally having fast paths).

That said, there's rarely a reason to take this into account; CPython interpreter overhead is pretty high, and it's pretty hard to imagine a scenario where manually breaking down a larger operation into smaller ones (incurring more interpreter overhead to do many small operations at the Python layer) would outweigh the much smaller cost of Python doing the array-based operations (even without special fast paths) at the C layer. The exceptions to this rule would all rely on non-Python-int-solutions, using fixed size numpy arrays to vectorize the work, and largely following C rules at that point (since numpy arrays are wrappers around raw C arrays most of the time).

Sign up to request clarification or add additional context in comments.

7 Comments

I assume, an example of the optimization is here where it checks if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) { return PyLong_FromLong(MEDIUM_VALUE(a) + MEDIUM_VALUE(b));}
@juanpa.arrivillaga: Yup. MEDIUM_VALUE is a macro that unpacks a single element from the underlying array (and fixes up its sign, since the array value isn't signed itself). So it's just unpacking both, adding them with a normal CPU instruction, then repacking them into a new int.
Does it mean that python 64 build work with integers efficiently only in range -2^29 to 2^29 -1?
@NoNameQA: The range is -2³⁰ - 1 to 2³⁰ - 1; the array holds a 30 bit unsigned magnitude in each element, and the sign is encoded in the size of the array (negative numbers store a negative length for the array). The idea being that overflow can't occur even in multiplication when both operands are small enough to fit in a 32 bit range, and the work is being done with 64 bit operands. The 30 bit (instead of 31 bit) thing is due to the relative ease of converting the algorithms that worked with 15 out of 16 bits by simply doubling all the numbers.
You can see the effect with sys.getsizeof; on my 64 bit build sys.getsizeof(-(2**30) - 1) is 28 (bytes), but sys.getsizeof(-(2**30)) is 32; the array had to expand when the high bit moved to the 31st bit.
|
2

You could also use the python struct library and pack a simple number then attempt to read the length back, the result will be in bytes:

import struct
size_int = len(struct.pack('i',0))
print(size_int)

4

2 Comments

As per documentation, your are here printing the value of sizeof(int) returned by the C compiler used to compile your Python interpreter, which is not the internal size used for integers by Python.
As @calandoa says, this is thoroughly useless for determining anything about the internal representation of Python ints, nor does it provide any help in determining what magnitude of integers will operate most efficiently.
0

You can use sys.getsizeof which returns the total size of any object including any garbage collector overhead, though this is apparently not the size of the internal piece of data used by the CPU for any hardware operation.

>>> import sys
>>> sys.getsizeof(1)
28
>>> sys.getsizeof(12345678)
28
>>> sys.getsizeof(1234567890)
32
>>> sys.getsizeof(1234567890123456789)
36
>>> sys.getsizeof(12345678901234567890123456789)
40

This may not be your question, but I guess there is no point in Python to infer this size if your plan is to make your code more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.