11

I understand that the average case of hash-table lookup is O(1), but does this include the time it takes to compute the hash value itself of the given input? I've tried looking for the answer on google, read all the docs needed but could not find the implementation of the internal hash() function in Python. Some websites state that computing the hash value takes a constant amount of time, some say it is O(k) where k is the length of the input. I would be happy if you could help me find the correct answer. Thanks in advance :)

1
  • 2
    If you can read C, take a look at the pyhash.c source code Commented Jul 14, 2018 at 11:52

2 Answers 2

5

It depends entirely on the type being hashed. Here are some simple tests in CPython 2.7.13, which is not the only option:

>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i), number=1)) for i in range(16)])
[(0, 1.9073486328125e-06),
 (1, 1.6927719116210938e-05),
 (2, 3.314018249511719e-05),
 (3, 4.887580871582031e-05),
 (4, 6.4849853515625e-05),
 (5, 8.106231689453125e-05),
 (6, 9.679794311523438e-05),
 (7, 0.00011301040649414062),
 (8, 0.00012993812561035156),
 (9, 0.00014495849609375),
 (10, 0.00016188621520996094),
 (11, 0.0001780986785888672),
 (12, 0.00019288063049316406),
 (13, 0.0002090930938720703),
 (14, 0.000225067138671875),
 (15, 0.00024199485778808594)]
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i))) for i in range(16)])
[(0, 0.09920382499694824),
 (1, 0.09032988548278809),
 (2, 0.09069585800170898),
 (3, 0.09006309509277344),
 (4, 0.09059309959411621),
 (5, 0.09033513069152832),
 (6, 0.09037399291992188),
 (7, 0.09031510353088379),
 (8, 0.09110498428344727),
 (9, 0.09074902534484863),
 (10, 0.0909719467163086),
 (11, 0.09081602096557617),
 (12, 0.09112405776977539),
 (13, 0.09091711044311523),
 (14, 0.09103798866271973),
 (15, 0.09085893630981445)]

Note how hashing a freshly created string is O(n), but every string is caching its hash so it amortises to constant time when repeated (number=1000000 is the default for timeit).

>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n=2**{}'.format(64*i))) for i in range(16)])
[(0, 0.09280180931091309),
 (1, 0.09100484848022461),
 (2, 0.09413909912109375),
 (3, 0.09609699249267578),
 (4, 0.10647201538085938),
 (5, 0.1146399974822998),
 (6, 0.12569880485534668),
 (7, 0.1291029453277588),
 (8, 0.13350296020507812),
 (9, 0.1369338035583496),
 (10, 0.14037799835205078),
 (11, 0.14420413970947266),
 (12, 0.1485278606414795),
 (13, 0.15162205696105957),
 (14, 0.15520405769348145),
 (15, 0.15993809700012207)]

long is also O(n), where n is the width of the number, thus logarithmic of magnitude. The granularity is that of digit, typically 2**30 specifically to be usable directly as a hash for smaller ints.

Other objects will have their own behaviour, for instance tuples will roughly sum the hash time of their contents.

Sign up to request clarification or add additional context in comments.

2 Comments

How did python know whether the hash of a string is cached without calculating hash firstly?
By storing a value indicating the hash hasn't been calculated; for instance, a hash of -1.
2

A small test I made to test the hypothesis. The results don't seem to depend on the length of the length of the input.

import datetime

x = ['a','aa','aaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa']
for i in range(len(x)):

    for j in range(len(x)):
        print "Checking for: " + x[i] + " " + x[j]
        a = datetime.datetime.now()

        h = hash((x[i],x[j])) 
        b = datetime.datetime.now()
        c = b - a
        print "Time taken : " + str(c.microseconds) 

Results

Checking for: a a
Time taken : 87
Checking for: a aa
Time taken : 10
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa a
Time taken : 9
Checking for: aa aa
Time taken : 8
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 11
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9

6 Comments

That's a property of CPython str, which caches the hash. It's possible for other types to have a slower __hash__ method.
My idea was to convey that the hash() time complexity is not O(k) where k is the length of the input. I tried the same with int inputs and list inputs with the same kind of relative results.The absolute times are different sure, but the relative time should matter here ?
long (now int) depends on the size of the number (basically log base ULONG_MAX, but the step is more likely cache dependent), and list is mutable and therefore not hashable.
It's not easy to time things like this accurately. Eg, the time taken to build the tuple (and garbage collect it) may be large enough to swamp the time taken to perform the hash. And datetime.datetime.now() is designed to produce monotonic timestamps, not accurate interval measurements. A better function for performance testing is time.perf_counter. Also see the timeit module, which simplifies timing tests, but bear in mind what Yann Vernier said about caching.
Thanks for the tip @PM2Ring.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.