4

In python, when I convert my set to a list, what is the algorithmic complexity of such a task? Is it merely type-casting the collection, or does it need to copy items into a different data structure? What's happening?

I'd love to learn that the complexity was constant, like so many things in Python.

4 Answers 4

6

You can easily see this with a simple benchmark:

import matplotlib.pyplot as plt


x = list(range(10, 20000, 20))
y = []
for n in x:
    s = set(range(n))
    res = %timeit -r2 -n2 -q -o list(s)
    y.append(res.best)


plt.plot(x, y)

plot

Which clearly shows a linear relationship -- modulo some noise.

(EDITED as the first version was benchmarking something different).

Sign up to request clarification or add additional context in comments.

6 Comments

So what's happening at 1000 and 6000? I understand that the overall shape is indicating O(N), I'm just curious what implementation detail is being shown by those steps.
I think the jumps are because the list implementation allocates extra space for growth, but has to reallocate the list when it reaches the limit.
This isn't a way to measure the time complexity of an operation; time complexity is theoretical, and it only applies for "sufficiently large" n, where 20,000 may well not be "sufficiently large". This graph gives some evidence about what the time complexity might be, but you cannot measure the time complexity of an algorithm by actually running it and measuring with a timer.
@kaya3 Of course. I still find this heuristic approach helpful in understanding what kind of behavior we should expect in real-world applications. Besides, the question is rather practical. There are multiple approaches to defining a container like set() and one like list() which may vary in future implementation, and this simple heuristic will give insights without knowing internals.
@AndrewJaffe It's the set's internal size, which quadruples at those points.
|
2

The time complexity in most cases will be O(n) where n is the size of the set, because:

  • The set is implemented as a hashtable whose underlying array size is bounded by a fixed multiple of the set's size. Iterating over the set is done by iterating over the underlying array, so it takes O(n) time.
  • Appending an item to a list takes O(1) amortized time, even if the list's underlying array is not originally allocated to be large enough for the whole set; so appending n items to an empty list takes O(n) time.

However, there is a caveat to this, which is that Python's sets have underlying array sizes based on the largest size the set object has had, not necessarily based on its current size; this is because the underlying array is not re-allocated to a smaller size when elements are removed from the set. If a set is small but used to be much larger, then iterating over it can be slower than O(n).

Comments

1

The complexity is linear because all references are copied to the new container. But only references and are and not objects - it can matter for big objects.

Comments

0

Since it's implemented with a hashtable, there is a theoretical worst case complexity of O(n^2). Depending on the items in the list, the hashing operation may try to allocate them all to the same memory slots, in which case it iterates through the collision chain to find an available space. Though i'm not sure what kind of list of items it would take to realize this scenario.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.