2

I have a number of lists of lists stored in a dictionary. I want to find the intersection of the sub-lists (i.e. intersection of dict[i][j]) for all keys of the dictionary. )

For example, if the dictionary stored sets of tuples instead, I could use the code:

set.intersection(*[index[key] for key in all_keys]) 

What is an efficient way to do this? One way I tried was to first convert each list of lists into a set of tuples and then taking the intersection of those, but this is rather clunky.

Example:

Suppose the dictionary of lists of lists is

dict = {} 
dict['A'] = [[1, 'charlie'], [2, 'frankie']] 
dict['B'] = [[1, 'charlie'], [2, 'chuck']]
dict['C'] = [[2, 'chuck'], [1, 'charlie']]

then I want to return

[1, 'charlie']

(maybe as a tuple, doesn't have to be list)

EDIT: I just found an okay way to do it, but it's not very 'pythonic'

def search(index, search_words): 
    rv = {tuple(t) for t in index[search_words[0]]}
    for word in search_words: 
        rv = rv.intersection({tuple(t) for t in index[word]})
    return rv 
4
  • Can you give a sample? Commented Jan 26, 2015 at 5:53
  • Yup! Just added one -- thanks Commented Jan 26, 2015 at 5:58
  • Where is a dictionary involved? Question title says dictionary Commented Jan 26, 2015 at 6:07
  • A, B, and C are stored in a dictionary. I'll edit to make clear Commented Jan 26, 2015 at 6:11

4 Answers 4

2

Let's call your dictionary of lists of lists d:

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

I am calling it d because dict is a builtin and we would prefer not to overwrite it.

Now, find the intersection:

>>> set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])

How it works

  • set(tuple(x) for x in d[k])

    For key k, this forms a set out of tuples of the elements in d[k]. Taking k='A' for example:

    >>> k='A'; set(tuple(x) for x in d[k])
    set([(2, 'frankie'), (1, 'charlie')])
    
  • [ set(tuple(x) for x in d[k]) for k in d ]

    This makes a list of the sets from the step above. Thus:

    >>> [ set(tuple(x) for x in d[k]) for k in d ]
    [set([(2, 'frankie'), (1, 'charlie')]),
     set([(2, 'chuck'), (1, 'charlie')]),
     set([(2, 'chuck'), (1, 'charlie')])]
    
  • set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )

    This gathers the intersection of the three sets as above:

    >>>set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
    set([(1, 'charlie')])
    
Sign up to request clarification or add additional context in comments.

1 Comment

Brilliant, exactly what I needed. I tried doing something similar for awhile but my notation must have been off. Thanks!
1
[item for item in A if item in B+C]

4 Comments

There will be a varying number of list of lists, and they are stored in a dictionary.
Can each list only have the item once, or can there be redundancy in a single list?
That's a good question. I'm pretty sure that each list will only have the item once... but it's scraped data though, so I'm not sure I would put my trust into that assumption.
Cuz if that was the case, you could just add all three lists, and us collections.count to see which has three.
1

Your usecase demands the use of reduce function. But, quoting the BDFL's take on reduce,

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.

So, what you have got is already 'pythonic'.

I would write the program like this

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']],
... 'B': [[1, 'charlie'], [2, 'chuck']],
... 'C': [[2, 'chuck'], [1, 'charlie']]}
>>> values = (value for value in d.values())
>>> result = {tuple(item) for item in next(values)}
>>> for item in values:
...    result &= frozenset(tuple(items) for items in item)
>>> result
set([(1, 'charlie')])

Comments

1
from collections import defaultdict    
d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

frequency = defaultdict(set)
for key, items in d.iteritems():
    for pair in items:
        frequency[tuple(pair)].add(key)
output = [
    pair for pair, occurrances in frequency.iteritems() if len(d) == len(occurrances)
]
print output

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.