After reading that interning string can help with performance. Do i just store the return value from the sys.intern call in the dictionary as the key and that is it?
t = {}
t[sys.intern('key')] = 'val'
Thanks
After reading that interning string can help with performance. Do i just store the return value from the sys.intern call in the dictionary as the key and that is it?
t = {}
t[sys.intern('key')] = 'val'
Thanks
Yes, that's how you will use it.
To be more specific on the performance, the doc states that:
Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare.
There are two steps in a (classic) dict lookup: 1. hash the object into a number that is the index in the array that stores the data; 2. iterate over the array cell at this index to find a couple (key, value) with the correct key.
Usually, the second step is reasonabily fast because we choose a hash function that ensures very few collisions (different objects - same hash). But it has still to check the key you are looking for against every stored key having the same hash. This is the step 2 that will be faster : strings identity is tested before the expensive test, char by char, of the string equality.
The step 1 is harder to accelerate, because you can store the hash along with the interned string... but you have to compute the hash to find the interned string itself.
This was theory! If you really need to improve performance, first do some benchmarks.
Then think of the specificity of the domain. You are storing IPv4 addresses as keys. An IPv4 address is a number between 0 and 256^4. If you replace the human friendly representation of an address by an integer, you'll get a faster hash (hashing small numbers in CPython if almost costless: https://github.com/python/cpython/blob/master/Python/pyhash.c) and a faster lookup. The ip_address module might be the best choice in your case.
If you are sure that addresses are between boundaries (e.g. 172.16.0.0 – 172.31.255.255) you can try to use an array instead of a dict. It should be faster unless your array is huge (disk swap).
Finally, if this is not fast enough, be ready to use a faster language.