Intersection of subelements of multiple list of lists in dictionary

Question

I have a number of lists of lists stored in a dictionary. I want to find the intersection of the sub-lists (i.e. intersection of dict[i][j]) for all keys of the dictionary. )

For example, if the dictionary stored sets of tuples instead, I could use the code:

set.intersection(*[index[key] for key in all_keys])

What is an efficient way to do this? One way I tried was to first convert each list of lists into a set of tuples and then taking the intersection of those, but this is rather clunky.

Example:

Suppose the dictionary of lists of lists is

dict = {} 
dict['A'] = [[1, 'charlie'], [2, 'frankie']] 
dict['B'] = [[1, 'charlie'], [2, 'chuck']]
dict['C'] = [[2, 'chuck'], [1, 'charlie']]

then I want to return

[1, 'charlie']

(maybe as a tuple, doesn't have to be list)

EDIT: I just found an okay way to do it, but it's not very 'pythonic'

def search(index, search_words): 
    rv = {tuple(t) for t in index[search_words[0]]}
    for word in search_words: 
        rv = rv.intersection({tuple(t) for t in index[word]})
    return rv

Where is a dictionary involved? Question title says dictionary — Adam Hughes
– Adam Hughes, Commented Jan 26, 2015 at 6:07
A, B, and C are stored in a dictionary. I'll edit to make clear — Craig
– Craig, Commented Jan 26, 2015 at 6:11

John1024 · Accepted Answer · 2015-01-26 06:28:22Z

2

Let's call your dictionary of lists of lists d:

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

I am calling it d because dict is a builtin and we would prefer not to overwrite it.

Now, find the intersection:

>>> set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])

How it works

set(tuple(x) for x in d[k])

For key k, this forms a set out of tuples of the elements in d[k]. Taking k='A' for example:
```
>>> k='A'; set(tuple(x) for x in d[k])
set([(2, 'frankie'), (1, 'charlie')])
```

[ set(tuple(x) for x in d[k]) for k in d ]

This makes a list of the sets from the step above. Thus:

>>> [ set(tuple(x) for x in d[k]) for k in d ]
[set([(2, 'frankie'), (1, 'charlie')]),
 set([(2, 'chuck'), (1, 'charlie')]),
 set([(2, 'chuck'), (1, 'charlie')])]

set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )

This gathers the intersection of the three sets as above:
```
>>>set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])
```

edited Jan 26, 2015 at 6:28

answered Jan 26, 2015 at 6:22

John1024

115k15 gold badges152 silver badges183 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Craig Over a year ago

Brilliant, exactly what I needed. I tried doing something similar for awhile but my notation must have been off. Thanks!

Adam Hughes · Accepted Answer · 2015-01-26 06:06:07Z

1

[item for item in A if item in B+C]

answered Jan 26, 2015 at 6:06

Adam Hughes

16.5k14 gold badges100 silver badges140 bronze badges

4 Comments

Craig Over a year ago

There will be a varying number of list of lists, and they are stored in a dictionary.

Adam Hughes Over a year ago

Can each list only have the item once, or can there be redundancy in a single list?

Craig Over a year ago

That's a good question. I'm pretty sure that each list will only have the item once... but it's scraped data though, so I'm not sure I would put my trust into that assumption.

Adam Hughes Over a year ago

Cuz if that was the case, you could just add all three lists, and us collections.count to see which has three.

thefourtheye · Accepted Answer · 2015-01-26 06:17:30Z

Your usecase demands the use of reduce function. But, quoting the BDFL's take on reduce,

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.

So, what you have got is already 'pythonic'.

I would write the program like this

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']],
... 'B': [[1, 'charlie'], [2, 'chuck']],
... 'C': [[2, 'chuck'], [1, 'charlie']]}
>>> values = (value for value in d.values())
>>> result = {tuple(item) for item in next(values)}
>>> for item in values:
...    result &= frozenset(tuple(items) for items in item)
>>> result
set([(1, 'charlie')])

Aravinda · Accepted Answer · 2015-01-26 06:37:40Z

1

from collections import defaultdict    
d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

frequency = defaultdict(set)
for key, items in d.iteritems():
    for pair in items:
        frequency[tuple(pair)].add(key)
output = [
    pair for pair, occurrances in frequency.iteritems() if len(d) == len(occurrances)
]
print output

edited Jan 26, 2015 at 6:37

answered Jan 26, 2015 at 6:31

Aravinda

1261 silver badge5 bronze badges

Collectives™ on Stack Overflow

Intersection of subelements of multiple list of lists in dictionary

4 Answers 4

How it works

1 Comment

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

How it works

1 Comment

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related