2

I'm using a following code to iterate through the lists of dictionaries to locate a corresponding key ['5'] and when found to compare the values. While it works fine I believe it could be improved to get a fast performance. What other ways could be used to achieve the same result?

listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]

result=[]
for dictA in listA:
    if not 5 in dictA.keys(): continue
    for dictB in listB:
        if 5 in dictB.keys() and dictB[5]==dictA[5]:
            result.append(dictB[5])
9
  • 2
    I was writing an answer, but I can't figure out what on earth you are trying to do here. That code is messed up. Commented Mar 30, 2014 at 17:04
  • 1
    Also this should probably be on codereview as it is about reviewing and optimising working code. Commented Mar 30, 2014 at 17:11
  • 1
    You need to give us some idea of the sizes involved here. If you're really dealing with both lists having 2 dictionaries with 3 entries each, stop wasting time optimizing this. Otherwise, solutions for "check list of 1 million dicts against list of 1 million dicts" and "check list of 1 million dicts against list of 2 dicts" are likely to be very different. Commented Mar 30, 2014 at 17:13
  • Also, I assume "if not 4 in dictA.keys()" should be "if not 5 in dictA.keys()" ? Commented Mar 30, 2014 at 17:14
  • Yeah, it should be 5 instead of 4. Already fixed. Sorry for the mess Commented Mar 30, 2014 at 17:18

6 Answers 6

3

A quick check also shows that 4 in dictA is faster than 4 in dictA.keys().

Sign up to request clarification or add additional context in comments.

3 Comments

I've learned that "if 5 in dictA.keys()" is more reliable than "if 5 in dictA". There were situations where the code was returning False even while a key was in a dict. I still don't know what caused it. .keys() method never fails. Any ideas why that would happen?
If you're using custom classes that don't implement hash() properly. But those classes shouldn't be used as dict keys anyway
In threaded code .keys() (in python2) returns a new list, it's more thread-safe.
2

First: You're not using most of listA; all you care about is the values from dictA[5]. So lets just extract the bits you care about, in a data structure that will allow fast access:

interesting_vals = frozenset([dictA[5] for dictA in listA if 5 in dictA])

Now we just need to check listB. Two approaches. The obvious one first:

result = [dictB[5] for dictB in listB
          if 5 in dictB and dictB[5] in interesting_vals]

or if you expect most dictBs to have a [5] element then this may be faster since it combines the access and the existance check (profile it with real data!):

NA = object()  # Will compare different to everything in interesting_vals
result = [dictB[5] for dictB in listB if dictB.get(5, NA) in interesting_vals]

This solution should be O(len(listA) + len(listB)), which is much better than your original O(len(listA) * len(listB)) if the lists are large.

Note that I am assuming that values of dictA[5] are hashable and have a hash that's consistent with equals - most built-in classes are, but some custom classes might not implement hash properly.

2 Comments

That's interesting. Thanks!
Inconsistent __hash__ functions are a great way to have fun.
2

You'd have to profile the code to see if there is an improvement, but it's generally a step in the right direction to use built-ins for the filtering, rather than scripting it yourself, since it will skip the interpreting of your filter boilerplate code.

for dictA in filter(lambda x : 4 in x, listA):
    for dictB in filter(lambda x : 5 in x, listB):
        if dictB[5]==dictA[5]:
            result.append(dictB[5])

Also, it makes it shorter and a bit more readable, which is with the Zen of Python. You shuold become acquainted with how a Python program looks, since you quite clearly appear to be trying to write C/Java like code in Python.

2 Comments

filer? filter!, :p
@CTZhu, what!? where!? :P
1

Oneliner:

%timeit filter(None, {item.get(5) for item in listA}.intersection(item.get(5) for item in listB))
100000 loops, best of 3: 8.59 us per loop

%%timeit
    ...: listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
    ...: listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]
    ...: 
    ...: result=[]
    ...: for dictA in listA:
    ...:     if not 4 in dictA.keys(): continue
    ...:     for dictB in listB:
    ...:         if 5 in dictB.keys() and dictB[5]==dictA[5]:
    ...:             result.append(dictB[5])
    ...:             
100000 loops, best of 3: 11.9 us per loop

Comments

0
result = [
    y[5]
    for x in listA
    if 5 in x
    for y in listB
    if 5 in y and x[5] == y[5]
]

2 Comments

I hate it when people use list comprehensions to compete with the unreadability of perl.
Agree. It looks ugly. But it good to know it's possible to put an entire page on a single line... and again, some say the expressions are fast.
0

You should always test speed, but generator expressions (or list comprehension), should be faster than iterating:

result= []
A = (a for a in listA if 5 in a) 
for a in A:
    result.extend(b[5] for b in listB if (5 in b and a[5] == b[5]))

Or try to go functional:

import functools
fives = partial(filter, lambda x: 5 in x)
for a in fives(listA):
    result.extend(b[5] for fives(listB) if a[5] == b[5])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.