Python: How to iterate lists of dictionaries Fast

Question

I'm using a following code to iterate through the lists of dictionaries to locate a corresponding key ['5'] and when found to compare the values. While it works fine I believe it could be improved to get a fast performance. What other ways could be used to achieve the same result?

listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]

result=[]
for dictA in listA:
    if not 5 in dictA.keys(): continue
    for dictB in listB:
        if 5 in dictB.keys() and dictB[5]==dictA[5]:
            result.append(dictB[5])

I was writing an answer, but I can't figure out what on earth you are trying to do here. That code is messed up. — anon582847382
– anon582847382, Commented Mar 30, 2014 at 17:04
Also this should probably be on codereview as it is about reviewing and optimising working code. — anon582847382
– anon582847382, Commented Mar 30, 2014 at 17:11
You need to give us some idea of the sizes involved here. If you're really dealing with both lists having 2 dictionaries with 3 entries each, stop wasting time optimizing this. Otherwise, solutions for "check list of 1 million dicts against list of 1 million dicts" and "check list of 1 million dicts against list of 2 dicts" are likely to be very different. — user9876
– user9876, Commented Mar 30, 2014 at 17:13
Also, I assume "if not 4 in dictA.keys()" should be "if not 5 in dictA.keys()" ? — user9876
– user9876, Commented Mar 30, 2014 at 17:14
Yeah, it should be 5 instead of 4. Already fixed. Sorry for the mess — alphanumeric
– alphanumeric, Commented Mar 30, 2014 at 17:18

Jarod · Accepted Answer · 2014-03-30 17:08:42Z

3

A quick check also shows that 4 in dictA is faster than 4 in dictA.keys().

answered Mar 30, 2014 at 17:08

Jarod

814 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

alphanumeric Over a year ago

I've learned that "if 5 in dictA.keys()" is more reliable than "if 5 in dictA". There were situations where the code was returning False even while a key was in a dict. I still don't know what caused it. .keys() method never fails. Any ideas why that would happen?

user9876 Over a year ago

If you're using custom classes that don't implement hash() properly. But those classes shouldn't be used as dict keys anyway

x3al Over a year ago

In threaded code .keys() (in python2) returns a new list, it's more thread-safe.

user9876 · Accepted Answer · 2014-03-30 17:22:11Z

2

First: You're not using most of listA; all you care about is the values from dictA[5]. So lets just extract the bits you care about, in a data structure that will allow fast access:

interesting_vals = frozenset([dictA[5] for dictA in listA if 5 in dictA])

Now we just need to check listB. Two approaches. The obvious one first:

result = [dictB[5] for dictB in listB
          if 5 in dictB and dictB[5] in interesting_vals]

or if you expect most dictBs to have a [5] element then this may be faster since it combines the access and the existance check (profile it with real data!):

NA = object()  # Will compare different to everything in interesting_vals
result = [dictB[5] for dictB in listB if dictB.get(5, NA) in interesting_vals]

This solution should be O(len(listA) + len(listB)), which is much better than your original O(len(listA) * len(listB)) if the lists are large.

Note that I am assuming that values of dictA[5] are hashable and have a hash that's consistent with equals - most built-in classes are, but some custom classes might not implement hash properly.

answered Mar 30, 2014 at 17:22

user9876

11.1k7 gold badges47 silver badges67 bronze badges

2 Comments

alphanumeric Over a year ago

That's interesting. Thanks!

Chris Wesseling Over a year ago

Inconsistent __hash__ functions are a great way to have fun.

StoryTeller - Unslander Monica · Accepted Answer · 2014-07-05 13:34:46Z

2

You'd have to profile the code to see if there is an improvement, but it's generally a step in the right direction to use built-ins for the filtering, rather than scripting it yourself, since it will skip the interpreting of your filter boilerplate code.

for dictA in filter(lambda x : 4 in x, listA):
    for dictB in filter(lambda x : 5 in x, listB):
        if dictB[5]==dictA[5]:
            result.append(dictB[5])

Also, it makes it shorter and a bit more readable, which is with the Zen of Python. You shuold become acquainted with how a Python program looks, since you quite clearly appear to be trying to write C/Java like code in Python.

edited Jul 5, 2014 at 13:34

answered Mar 30, 2014 at 17:06

StoryTeller - Unslander Monica

172k22 gold badges401 silver badges481 bronze badges

2 Comments

CT Zhu Over a year ago

filer? filter!, :p

StoryTeller - Unslander Monica Over a year ago

@CTZhu, what!? where!? :P

CT Zhu · Accepted Answer · 2014-03-30 17:19:14Z

1

Oneliner:

%timeit filter(None, {item.get(5) for item in listA}.intersection(item.get(5) for item in listB))
100000 loops, best of 3: 8.59 us per loop

%%timeit
    ...: listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
    ...: listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]
    ...: 
    ...: result=[]
    ...: for dictA in listA:
    ...:     if not 4 in dictA.keys(): continue
    ...:     for dictB in listB:
    ...:         if 5 in dictB.keys() and dictB[5]==dictA[5]:
    ...:             result.append(dictB[5])
    ...:             
100000 loops, best of 3: 11.9 us per loop

answered Mar 30, 2014 at 17:19

CT Zhu

54.6k18 gold badges125 silver badges136 bronze badges

Comments

x3al · Accepted Answer · 2014-03-30 17:26:57Z

0

result = [
    y[5]
    for x in listA
    if 5 in x
    for y in listB
    if 5 in y and x[5] == y[5]
]

edited Mar 30, 2014 at 17:26

answered Mar 30, 2014 at 17:11

x3al

5861 gold badge8 silver badges26 bronze badges

2 Comments

StoryTeller - Unslander Monica Over a year ago

I hate it when people use list comprehensions to compete with the unreadability of perl.

alphanumeric Over a year ago

Agree. It looks ugly. But it good to know it's possible to put an entire page on a single line... and again, some say the expressions are fast.

BenMorel · Accepted Answer · 2014-06-27 14:26:25Z

0

You should always test speed, but generator expressions (or list comprehension), should be faster than iterating:

result= []
A = (a for a in listA if 5 in a) 
for a in A:
    result.extend(b[5] for b in listB if (5 in b and a[5] == b[5]))

Or try to go functional:

import functools
fives = partial(filter, lambda x: 5 in x)
for a in fives(listA):
    result.extend(b[5] for fives(listB) if a[5] == b[5])

edited Jun 27, 2014 at 14:26

BenMorel

37k52 gold badges208 silver badges339 bronze badges

answered Mar 30, 2014 at 17:17

m.wasowski

6,3761 gold badge25 silver badges30 bronze badges

Collectives™ on Stack Overflow

Python: How to iterate lists of dictionaries Fast

6 Answers 6

3 Comments

2 Comments

2 Comments

Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

2 Comments

2 Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related