2

In the process of answering this question, I came across something I couldn't explain.

Given the following Python 3.5 code:

import time

def di(n):
    for i in range(10000000): n / 101

i = 10
while i < 1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000:
    start = time.clock()
    di(i)
    end = time.clock()
    print("On " + str(i) + " " + str(end-start))
    i *= 10000

The output is:

On 10 0.546889
On 100000 0.545004
On 1000000000 0.5454929999999998
On 10000000000000 0.5519709999999998
On 100000000000000000 1.330797
On 1000000000000000000000 1.31053
On 10000000000000000000000000 1.3393129999999998
On 100000000000000000000000000000 1.3524339999999997
On 1000000000000000000000000000000000 1.3817269999999997
On 10000000000000000000000000000000000000 1.3412670000000002
On 100000000000000000000000000000000000000000 1.3358929999999987
On 1000000000000000000000000000000000000000000000 1.3773859999999996
On 10000000000000000000000000000000000000000000000000 1.3326890000000002
On 100000000000000000000000000000000000000000000000000000 1.3704769999999993
On 1000000000000000000000000000000000000000000000000000000000 1.3235019999999995
On 10000000000000000000000000000000000000000000000000000000000000 1.357647
On 100000000000000000000000000000000000000000000000000000000000000000 1.3341190000000012
On 1000000000000000000000000000000000000000000000000000000000000000000000 1.326544000000002
On 10000000000000000000000000000000000000000000000000000000000000000000000000 1.3671139999999973
On 100000000000000000000000000000000000000000000000000000000000000000000000000000 1.3630120000000012
On 1000000000000000000000000000000000000000000000000000000000000000000000000000000000 1.3600200000000022
On 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1.3189189999999975
On 100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 1.3503469999999993

As you can see, there are roughly two times: one for smaller numbers, and one for larger numbers.

The same result happens with Python 2.7 using the following function to preserve semantics:

def di(n):
    for i in xrange(10000000): n / 101.0

On the same machine, I get:

On 10 0.617427
On 100000 0.61805
On 1000000000 0.6366
On 10000000000000 0.620919
On 100000000000000000 0.616695
On 1000000000000000000000 0.927353
On 10000000000000000000000000 1.007156
On 100000000000000000000000000000 0.98597
On 1000000000000000000000000000000000 0.99258
On 10000000000000000000000000000000000000 0.966753
On 100000000000000000000000000000000000000000 0.992684
On 1000000000000000000000000000000000000000000000 0.991711
On 10000000000000000000000000000000000000000000000000 0.994703
On 100000000000000000000000000000000000000000000000000000 0.978877
On 1000000000000000000000000000000000000000000000000000000000 0.982035
On 10000000000000000000000000000000000000000000000000000000000000 0.973266
On 100000000000000000000000000000000000000000000000000000000000000000 0.977911
On 1000000000000000000000000000000000000000000000000000000000000000000000 0.996857
On 10000000000000000000000000000000000000000000000000000000000000000000000000 0.972555
On 100000000000000000000000000000000000000000000000000000000000000000000000000000 0.985676
On 1000000000000000000000000000000000000000000000000000000000000000000000000000000000 0.987412
On 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0.997207
On 100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 0.970129

Why is there this consistent difference between floating point division of smaller vs. larger numbers? Does it have to do with Python internally using floats for smaller numbers and doubles for larger ones?

1
  • Note that n / 101 in Python 3 and n / 101.0 in Python 2 do not have the same semantics: the former does a correctly-rounded division (using integer arithmetic internally for large n) of the two integer inputs. The latter converts n to a float (potentially losing accuracy in the process) and then does a floating-point division. For an exact equivalent of the Python 3 behaviour in Python 2, you'd want to do from __future__ import division, or use operator.truediv. For an example where this makes a difference, try n = 11633230408450025. Commented Feb 16, 2016 at 8:06

2 Answers 2

8

It has more to do with Python storing exact integers as Bignums.

In Python 2.7, computation of integer a / float fb, starts by converting the integer to a float. If the integer is stored as a Bignum [Note 1] then this takes longer. So it's not the division that has differential cost; it is the conversion of the integer (possibly a Bignum) to a double.

Python 3 does the same computation for integer a / float fb, but with integer a / integer b, it tries to compute the closest representable result, which might differ slightly from the naive float(a) / float(b). (This is similar to the classic double-rounding problem.)

If both float(a) and float(b) are precise (that is, both a and b are no larger than 53 bits), then the naive solution works, and the result only requires the division of two double-precision floats.

Otherwise, a multiprecision division is performed to generate the correct 53-bit mantissa (the exponent is computed separately), and the result is converted precisely to a floating point number. There are two possibilities for this division: a fast-track if b is small enough to fit in a single Bignum unit (which applies to the benchmark in the OP), and a slower, general Bignum division when b is larger.

In none of the above cases is the speed difference observed related to the speed with which the hardware performs floating point division. For the original Python 3.5 test, the difference relates to whether floating point or Bignum division is performed; for the Python 2.7 case, the difference relates to the necessity to convert a Bignum to a double.

Thanks to @MarkDickinson for the clarification, and the pointer to the source code (with a long and useful comment) which implements the algorithm.


Notes

  1. In Python 3, integers are always stored as Bignums. Python 2 has separate types for int (64-bit integers) and long (Bignums). In practice, since Python 3 often uses optimized algorithms when the Bignum has only one "leg", the difference between "small" and "big" integers is still noticeable.
Sign up to request clarification or add additional context in comments.

4 Comments

This isn't quite the whole story: in Python 3, all integers are stored as BigNums, but when doing division there's some special-case handling in the source code for the case where both the dividend and divisor are representable in 53 bits or fewer: in that case both can be converted to double without loss of accuracy, and a floating-point division can be used to compute the result. In the remaining case, integer arithmetic is used to find the quotient.
For reference, here's the relevant source: github.com/python/cpython/blob/master/Objects/…. I'll write a proper answer later, after work.
@MarkDickinson: thanks. I hope I captured this information correctly.
Ah, that's beautiful. Now I don't have to write my own answer!
1

It's the larger integer format, as @rici said. I changed the initial 10 to 10.0 ... here's the result, no significant change in timing.

On 10.0 1.12
On 100000.0 0.79
On 1000000000.0 0.79
On 1e+13 0.77
On 1e+17 0.78
On 1e+21 0.79
On 1e+25 0.77
On 1e+29 0.8
On 1e+33 0.77
On 1e+37 0.8
On 1e+41 0.78
On 1e+45 0.78
On 1e+49 0.78
On 1e+53 0.79
On 1e+57 0.77
On 1e+61 0.8
On 1e+65 0.77
On 1e+69 0.79
On 1e+73 0.77
On 1e+77 0.78
On 1e+81 0.78
On 1e+85 0.78
On 1e+89 0.77

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.