How to check if a key/value is repeated elsewhere in a dictionary using Python

Question

I have a dictionary in python like:

dict = {'dog':['milo','otis','laurel','hardy'],
        'cat':['bob','joe'],
        'milo':['otis','laurel','hardy','dog'],
        'hardy':['dog'],'bob':['joe','cat']}

...and I want to identify if a key exists elsewhere in a dictionary (in some other list of values). There are other questions I could find that want to know if an item simply exists in the dictionary, but this is not my question. The same goes for items in each list of values, to identify items that do not exist in OTHER keys and their associated values in the dictionary.

In the above example, the idea is that dogs and cats are not equal, their keys/values have nothing in common with those that come from cats. Ideally, a second dictionary would be created that collects all of those associated with each unique cluster:

unique.dict = {'cluster1':['dog','milo','otis','laurel','hardy'],
               'cluster2':['cat','bob','joe'] }

This is a follow up question to In Python, count unique key/value pairs in a dictionary

What would you want to do if you had another key, say 'puppy', and it had the value ['dog', 'cat', 'milo', 'hardy', 'bob', 'joe']? — Cody Piersall
– Cody Piersall, Commented Jan 25, 2014 at 1:00
Ah, good question. In my data, such an example would not exist. Puppy could, but it would not be associated with cats and their filthy associates. — Vince
– Vince, Commented Jan 25, 2014 at 1:04

Paul Draper · Accepted Answer · 2014-01-25 01:29:20Z

It appears that the relationship is symmetric, but your data is not (e.g. there is no key 'otis'). The first part involves making it symmetric, so it won't matter where we start.

(If your data actually is symmetric, then skip that part.)

Python 2.7

from collections import defaultdict

data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}

# create symmetric version of data
d = defaultdict(list)
for key, values in data.iteritems():
    for value in values:
        d[key].append(value)
        d[value].append(key)

visited = set()
def connected(key):
    result = []
    def connected(key):
        if key not in visited:
            visited.add(key)
            result.append(key)
            map(connected, d[key])
    connected(key)
    return result

print [connected(key) for key in d if key not in visited]

Python 3.3

from collections import defaultdict

data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}

# create symmetric version of data
d = defaultdict(list)
for key, values in data.items():
    for value in values:
        d[key].append(value)
        d[value].append(key)

visited = set()
def connected(key):
    visited.add(key)
    yield key
    for value in d[key]:
        if key not in visited:
            yield from connected(value)

print([list(connected(key)) for key in d if key not in visited])

Result

[['otis', 'milo', 'laurel', 'dog', 'hardy'], ['cat', 'bob', 'joe']]

Performance

O(n), where n is the total number of keys and values in data (in your case, 17 if I count correctly).

abarnert · Accepted Answer · 2014-01-25 01:03:50Z

I'm taking "in some other list of values" literally, to mean that a key existing in its own set of values is OK. If not, that would make things slightly simpler, but you should be able to adjust the code yourself, so I won't write it both ways.

If you insist on using this data structure, you have to do it by brute force:

def does_key_exist_in_other_value(d, key):
    for k, v in d.items():
        if k != key and key in v:
            return True

You could of course condense that into a one-liner with a genexpr and any:

    return any(key in v for k, v in d.items() if k != key)

But a smarter thing to do would be to use a better data structure. At the very least use sets instead of lists as your values (which wouldn't simplify your code, but would make it a lot faster—if you have K keys and V total elements across your values, it would run in O(K) instead of O(KV).

But really, if you want to look things up, build a dict to look things up in:

inv_d = defaultdict(set)
for key, value in d.items():
    for v in value:
        inv_d[v].add(key)

And now, your code is just:

def does_key_exist_in_other_value(inv_d, key):
    return inv_d[key] != {key}

Collectives™ on Stack Overflow

How to check if a key/value is repeated elsewhere in a dictionary using Python

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related