1

I have a dictionary in python like:

dict = {'dog':['milo','otis','laurel','hardy'],
        'cat':['bob','joe'],
        'milo':['otis','laurel','hardy','dog'],
        'hardy':['dog'],'bob':['joe','cat']}

...and I want to identify if a key exists elsewhere in a dictionary (in some other list of values). There are other questions I could find that want to know if an item simply exists in the dictionary, but this is not my question. The same goes for items in each list of values, to identify items that do not exist in OTHER keys and their associated values in the dictionary.

In the above example, the idea is that dogs and cats are not equal, their keys/values have nothing in common with those that come from cats. Ideally, a second dictionary would be created that collects all of those associated with each unique cluster:

unique.dict = {'cluster1':['dog','milo','otis','laurel','hardy'],
               'cluster2':['cat','bob','joe'] }

This is a follow up question to In Python, count unique key/value pairs in a dictionary

2
  • What would you want to do if you had another key, say 'puppy', and it had the value ['dog', 'cat', 'milo', 'hardy', 'bob', 'joe']? Commented Jan 25, 2014 at 1:00
  • 2
    Ah, good question. In my data, such an example would not exist. Puppy could, but it would not be associated with cats and their filthy associates. Commented Jan 25, 2014 at 1:04

2 Answers 2

1

It appears that the relationship is symmetric, but your data is not (e.g. there is no key 'otis'). The first part involves making it symmetric, so it won't matter where we start.

(If your data actually is symmetric, then skip that part.)

Python 2.7

from collections import defaultdict

data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}

# create symmetric version of data
d = defaultdict(list)
for key, values in data.iteritems():
    for value in values:
        d[key].append(value)
        d[value].append(key)

visited = set()
def connected(key):
    result = []
    def connected(key):
        if key not in visited:
            visited.add(key)
            result.append(key)
            map(connected, d[key])
    connected(key)
    return result

print [connected(key) for key in d if key not in visited]

Python 3.3

from collections import defaultdict

data = {'dog':['milo','otis','laurel','hardy'],'cat':['bob','joe'],'milo':['otis','laurel','hardy','dog'],'hardy':['dog'],'bob':['joe','cat']}

# create symmetric version of data
d = defaultdict(list)
for key, values in data.items():
    for value in values:
        d[key].append(value)
        d[value].append(key)

visited = set()
def connected(key):
    visited.add(key)
    yield key
    for value in d[key]:
        if key not in visited:
            yield from connected(value)

print([list(connected(key)) for key in d if key not in visited])

Result

[['otis', 'milo', 'laurel', 'dog', 'hardy'], ['cat', 'bob', 'joe']]

Performance

O(n), where n is the total number of keys and values in data (in your case, 17 if I count correctly).

Sign up to request clarification or add additional context in comments.

Comments

1

I'm taking "in some other list of values" literally, to mean that a key existing in its own set of values is OK. If not, that would make things slightly simpler, but you should be able to adjust the code yourself, so I won't write it both ways.

If you insist on using this data structure, you have to do it by brute force:

def does_key_exist_in_other_value(d, key):
    for k, v in d.items():
        if k != key and key in v:
            return True

You could of course condense that into a one-liner with a genexpr and any:

    return any(key in v for k, v in d.items() if k != key)

But a smarter thing to do would be to use a better data structure. At the very least use sets instead of lists as your values (which wouldn't simplify your code, but would make it a lot faster—if you have K keys and V total elements across your values, it would run in O(K) instead of O(KV).

But really, if you want to look things up, build a dict to look things up in:

inv_d = defaultdict(set)
for key, value in d.items():
    for v in value:
        inv_d[v].add(key)

And now, your code is just:

def does_key_exist_in_other_value(inv_d, key):
    return inv_d[key] != {key}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.