How does Python hash long numbers? I guess it takes O(1) time for 32-bit ints, but the way long integers work in Python makes me think the complexity is not O(1) for them. I've looked for answers in relevant questions, but have found none straightforward enough to make me confident. Thank you in advance.
1 Answer
The long_hash() function indeed loops and depends on the size of the integer, yes:
/* The following loop produces a C unsigned long x such that x is
congruent to the absolute value of v modulo ULONG_MAX. The
resulting x is nonzero if and only if v is. */
while (--i >= 0) {
/* Force a native long #-bits (32 or 64) circular shift */
x = (x >> (8*SIZEOF_LONG-PyLong_SHIFT)) | (x << PyLong_SHIFT);
x += v->ob_digit[i];
/* If the addition above overflowed we compensate by
incrementing. This preserves the value modulo
ULONG_MAX. */
if (x < v->ob_digit[i])
x++;
}
where i is the 'object size', e.g. the number of digits used to represent the number, where the size of a digit depends on your platform.
7 Comments
Eli Korvigo
As far as I know, in Python 3 all numbers are of long type, so is it safe to say that it's always O(i) to hash a number in Python 3?
Martijn Pieters
@GrayFall9: yes, and yes. Python 3
int is what Python 2 called long. Note that the digit size is usually either 15 or 30 bits, so even for large numbers the number of iterations is very small. You'd need numbers over 1152921504606846976 to even go to 3 iterations on most modern systems.jfs
@EliKorvigo: to be clear the time complexity is
O(log(n)) where n is the integer. For large integers that are less than available memory, you could use gmpy2, to improve time performanceEli Korvigo
@J.F.Sebastian well, that's strange. I just tried to hash a random 500-digit integer in Python 2 and it finished instantly on my machine.
Martijn Pieters
@EliKorvigo: That's because
math.log2(500) is just under 9. You'll need to use far bigger numbers to see an effect. What made you think that O(logN) meant that code is going to run slow for N=500? |