I understand that the average case of hash-table lookup is O(1), but does this include the time it takes to compute the hash value itself of the given input? I've tried looking for the answer on google, read all the docs needed but could not find the implementation of the internal hash() function in Python. Some websites state that computing the hash value takes a constant amount of time, some say it is O(k) where k is the length of the input. I would be happy if you could help me find the correct answer. Thanks in advance :)
-
2If you can read C, take a look at the pyhash.c source codePM 2Ring– PM 2Ring2018-07-14 11:52:50 +00:00Commented Jul 14, 2018 at 11:52
2 Answers
It depends entirely on the type being hashed. Here are some simple tests in CPython 2.7.13, which is not the only option:
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i), number=1)) for i in range(16)])
[(0, 1.9073486328125e-06),
(1, 1.6927719116210938e-05),
(2, 3.314018249511719e-05),
(3, 4.887580871582031e-05),
(4, 6.4849853515625e-05),
(5, 8.106231689453125e-05),
(6, 9.679794311523438e-05),
(7, 0.00011301040649414062),
(8, 0.00012993812561035156),
(9, 0.00014495849609375),
(10, 0.00016188621520996094),
(11, 0.0001780986785888672),
(12, 0.00019288063049316406),
(13, 0.0002090930938720703),
(14, 0.000225067138671875),
(15, 0.00024199485778808594)]
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n="a"*{}'.format(6400*i))) for i in range(16)])
[(0, 0.09920382499694824),
(1, 0.09032988548278809),
(2, 0.09069585800170898),
(3, 0.09006309509277344),
(4, 0.09059309959411621),
(5, 0.09033513069152832),
(6, 0.09037399291992188),
(7, 0.09031510353088379),
(8, 0.09110498428344727),
(9, 0.09074902534484863),
(10, 0.0909719467163086),
(11, 0.09081602096557617),
(12, 0.09112405776977539),
(13, 0.09091711044311523),
(14, 0.09103798866271973),
(15, 0.09085893630981445)]
Note how hashing a freshly created string is O(n), but every string is caching its hash so it amortises to constant time when repeated (number=1000000 is the default for timeit).
>>> pprint.pprint([(i, timeit.timeit('hash(n)', setup='n=2**{}'.format(64*i))) for i in range(16)])
[(0, 0.09280180931091309),
(1, 0.09100484848022461),
(2, 0.09413909912109375),
(3, 0.09609699249267578),
(4, 0.10647201538085938),
(5, 0.1146399974822998),
(6, 0.12569880485534668),
(7, 0.1291029453277588),
(8, 0.13350296020507812),
(9, 0.1369338035583496),
(10, 0.14037799835205078),
(11, 0.14420413970947266),
(12, 0.1485278606414795),
(13, 0.15162205696105957),
(14, 0.15520405769348145),
(15, 0.15993809700012207)]
long is also O(n), where n is the width of the number, thus logarithmic of magnitude. The granularity is that of digit, typically 2**30 specifically to be usable directly as a hash for smaller ints.
Other objects will have their own behaviour, for instance tuples will roughly sum the hash time of their contents.
2 Comments
A small test I made to test the hypothesis. The results don't seem to depend on the length of the length of the input.
import datetime
x = ['a','aa','aaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa','aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa']
for i in range(len(x)):
for j in range(len(x)):
print "Checking for: " + x[i] + " " + x[j]
a = datetime.datetime.now()
h = hash((x[i],x[j]))
b = datetime.datetime.now()
c = b - a
print "Time taken : " + str(c.microseconds)
Results
Checking for: a a
Time taken : 87
Checking for: a aa
Time taken : 10
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: a aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa a
Time taken : 9
Checking for: aa aa
Time taken : 8
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 11
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a
Time taken : 10
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 8
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
Checking for: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Time taken : 9
6 Comments
str, which caches the hash. It's possible for other types to have a slower __hash__ method.hash() time complexity is not O(k) where k is the length of the input. I tried the same with int inputs and list inputs with the same kind of relative results.The absolute times are different sure, but the relative time should matter here ?long (now int) depends on the size of the number (basically log base ULONG_MAX, but the step is more likely cache dependent), and list is mutable and therefore not hashable.datetime.datetime.now() is designed to produce monotonic timestamps, not accurate interval measurements. A better function for performance testing is time.perf_counter. Also see the timeit module, which simplifies timing tests, but bear in mind what Yann Vernier said about caching.