3

According to the Python documentation, set is a mutable unordered collection.

Usually, it's implemented as a hash table that stores references to objects as its keys. Comparing to dict (which is also usually a hash table), a dictionary's order of elements is guaranteed to be insertion order since Python 3.7. For set, it is not.

There is beautiful in-depth explanation about Python dicts and sets implementation behaviour in Why is the order in dictionaries and sets arbitrary?

But it remains unclear: can it be that a set will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly? Or is it guaranteed to be stable within this run of a program? Does it depend on set size?

s = {1, 4, 3, 5}
iter_result = [i for i in s]

# some other code not modifying set

assert [i for i in s] == iter_result  # always true within this run???
3
  • You might like set_table_resize() from the reference Commented Sep 4 at 13:05
  • 1
    shorter: iter_result = list(s) Commented Sep 4 at 13:20
  • Duplicate? Is python's "set" stable? Commented Sep 4 at 20:24

2 Answers 2

2

Iteration order of a set is not guaranteed by the language.

In CPython 3.7+, the order is usually stable until the set is mutated, but any change (add/remove/resize) can reshuffle it, and order also differs across runs due to hash randomization.

s = {1, 4, 3, 5}
list(s) == list(s)   # True until you mutate s

If you need a defined order, use sorted(s) or a list/dict.

Sign up to request clarification or add additional context in comments.

Comments

1

can it be that a set will be internally rehashed or reallocated by Python interpreter during the execution of some code that doesn't modify that set explicitly?

No.

Hashable from python glossary:

An object is hashable if it has a hash value which never changes during its lifetime

Set docs

The elements of a set must be hashable.

Sets will not be rehashed during their lifetime. Lists preserve the insertion order so the order of a list built from a set is the order given from the set and that will not change since hashes don't change.

As mentioned, hashes are randomized across executions as noted by this Bash one-liner that builds:

  • A list based on object's hash.
  • A set from that list. Order changes due to hash collisions.
  • A list from the set. Same order of the set.
for i in 1 2 3 4; do python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done

Result

['baa', 'bah', 'bai', 'bam', 'bap']
{'baa', 'bap', 'bah', 'bai', 'bam'}
['baa', 'bap', 'bah', 'bai', 'bam']

['bag', 'bai', 'bal', 'bax']
{'bai', 'bag', 'bal', 'bax'}
['bai', 'bag', 'bal', 'bax']

['bae', 'bai', 'baq', 'bax']
{'baq', 'bae', 'bax', 'bai'}
['baq', 'bae', 'bax', 'bai']

['bae', 'bap', 'bax']
{'bae', 'bap', 'bax'}
['bae', 'bap', 'bax']

But, we can turn off hash randomization by setting PYTHONHASHSEED to a constant value

for i in 1 2 3 4; do PYTHONHASHSEED=1 python3.12 -c 'import string;l=[f"ba{c}" for c in string.ascii_lowercase if hash(f"ba{c}") % 8 == 2 ];print(f"{l}\n{set(l)}\n{list(set(l))}\n")';done

Result

['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']

['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']

['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']

['bac', 'bah', 'bak', 'baw']
{'bac', 'bak', 'bah', 'baw'}
['bac', 'bak', 'bah', 'baw']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.