3

I'm iterating through a list of over three million items and am assigning them integer values. For organization, I have made a dictionary whose keys are the integers and values are the list of items with that score. A priori, I do not know how many items will have a certain score, so I'm using the + operator to append onto the list as follows:

for e in xs:
   myDict[val(e)] = myDict.get(val,[]) + [e]

My questions are:

  1. Is there a cleaner way of doing this?
  2. What's the time complexity of the + operation? Is it creating an entirely new list, copying the elements from the original list, and adding them in?
  3. What if I was adding an element to a set?
1
  • Re "Is it creating an entirely new list, copying the elements from the original list, and adding them in?": That would be an example of the Shlemiel the painter’s algorithm. Commented Feb 18, 2023 at 17:23

2 Answers 2

4

Use append:

for e in xs:
   myDict.setdefault(val(e), []).append(e)

This avoids building a new list every time. The operation list1 + list2 needs to build a new list in each iteration and hence allocate memory. append is more efficient because a list as pre-allocated memory at the end. For example, building a list with append starting from an empty list to a list with 10 million entries takes a bit more than 100 memory allocations.

The setdefault method of dictionaries returns the corresponding value if the key exist. If the key is not in th dictionary, it returns the default. In this case the default is a list. Due to the mutability of lists, we can append to the empty list in the first and to the partially filled list in each subsequent iteration.

An alternative to using setdefault() is collections.defaultdict. Do some profiling to find out which one is faster.

Sign up to request clarification or add additional context in comments.

2 Comments

Aha thank you. So does the setdefault method return the reference to the value of the key then? Otherwise the appending would not do anything.
Yes. If the key exists, it returns it, otherwise, it sets the key to be the second argument and then returns that.
2

Yes, append and use a collections.defaultdict:

from collections import defaultdict

d = defaultdict(list)
for e in xs:
   d[val(e)].append(e)

appending is an 0(1) operation, your approach is linear 0(n+k) as you create a new list each time.

If you were in a situation where you were adding multiple items, you would extend your list.

One thing to note is that my_list += some_list is equivalent to mylist.extend(some_list), but it is very different from my_list = my_list + some_list. The former adds to the original list, and the latter is doing what your code is doing, concatenating two lists to create a completely new list.

The complexity for extend or += is 0(k) where k is the length of some_list.

wiki.python has a list of the complexity of common operations in Python.

2 Comments

Is there absolutely no difference between += and extend? That is, can I do += [e] and .extend([e]) and see no differences even on the nanosecond level?
There will be nanosecond differences but they both do the same job, I imagine extend may be the very slightly faster operation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.