1

I have a dictionary that resembles the following:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

I would like to merge all keys that have at least one common element and as a result. For example, the resulting dictionary should look like:

dict1 = {'key1':['1','2','3','4','5'],'key3':['6','7','8']}

Please note how key2 has been eliminated. Whether it is key1 or key2 that is eliminated does not matter. I have only gotten as far as being able to identify repeats, but not how to merge them in an iterative fashion. Thanks

8
  • What if key2 and 3 shared e.g value 4? Commented May 29, 2015 at 21:06
  • why key1 and not key2? dicts have no order so what key comes first is not guaranteed Commented May 29, 2015 at 21:07
  • @Padraic, because they have a common item in their values ('3'), all items in key3 are unique to key3 so it remains separate Commented May 29, 2015 at 21:08
  • yes but why does key1 remain and you remove key2? Commented May 29, 2015 at 21:09
  • either key1 or key2 should be eliminated. I don't care which Commented May 29, 2015 at 21:10

3 Answers 3

2

Would that work for you? Please note that since the order of elements in the dictionary is arbitrary, you cannot guarantee which keys will end up being inserted into the output dictionary.

dict_out = {}
processed = set()
for k1, v1 in dict_in.items():
    if k1 not in processed:
        processed.add(k1)
        vo = v1
        for k2, v2 in dict_in.items():
            if k2 not in processed and set(v1) & set(v2):
                vo = sorted(list(set(vo + v2)))
                processed.add(k2)
        dict_out[k1] = vo

This for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': ['6', '7', '8']}

And for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['7', '9']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'6', '7', '8', '9'}}

And finally, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

it gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'5', '6', '7', '8'}}

EDIT

OP requested that even outcomes of merges should be merged with each other. To achieve that, we can wrap the code above in a loop like this:

d = dict_in
processed = set([None])
while processed:
    dict_out = {}
    processed = set()
    for k1, v1 in d.items():
        if k1 not in processed:
            vo = v1
            for k2, v2 in d.items():
                if k1 is not k2 and set(vo) & set(v2):
                    vo = sorted(list(set(vo + v2)))
                    processed.add(k2)
            dict_out[k1] = vo
    d = dict_out

Then, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

we get:

{'key4': ['1', '2', '3', '4', '5', '6', '7', '8']}

and for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['4', '6', '7'], 'key4': ['8', '9']}

we get:

{'key1': ['1', '2', '3', '4', '5', '6', '7'], 'key4': ['8', '9']}
Sign up to request clarification or add additional context in comments.

4 Comments

try dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8'],'key4': ["9","10"]}
Gives : {'key1': ['1', '2', '3', '4', '5'], 'key3': ['6', '7', '8'], 'key4': ['9', '10']}. Isn't that correct?
This is great for merging once, but a previously merged key/value pair cannot merge with others. For example: dict_in = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['4','6','7'],'key4':['8','9']} This gives: {'key3': set(['8', '3', '5', '4', '7']), 'key1': ['1', '2', '3'], 'key4': ['9', '10']} which is not right since key1 and key3 both have 3 in them. So close!
That behavior was intentional :) I thought that this is what you actually need. Small edit will fix it.
1

If you want to change the original dict you will need to copy:

vals = {k: set(val) for k, val in dict1.items()}

for key, val in dict1.copy().items():
    for k, v in vals.copy().items():
        if k == key:
            continue
        if v.intersection(val):
            union = list(v.union(val))
            dict1[key] = union
            del vals[k]
            del dict1[k]

If you want to union all:

vals = {k: set(val) for k, val in dict1.items()}
unioned = set()
srt = sorted(dict1.keys())
srt2 = srt[:]
for key in srt:
    for k in srt2:
        if k == key:
            continue
        if vals[k].intersection(dict1[key]) and key not in unioned:
            unioned.add(k)
            dict1[key] = list(vals[k].union(dict1[key]))
            srt2.remove(k)

for k in unioned:
    del dict1[k]

2 Comments

I like this answer, works perfectly. Andrzej was first though... Thanks for your efforts and assistance.
@Vincem no worries, I was not totally sure of what should happen when you have updated values, I added another way that handles that
0

I have a more compact method.

I think it's more readable and easy to understand. You can refer as below:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

# Index your key of dict
l = list(enumerate(sorted(dict1.keys())))

# nested loop
for i in xrange(len(dict1)):
    for j in xrange(i+1,len(dict1)):
        i_key, j_key = l[i][1], l[j][1]
        i_value, j_value = set(dict1[i_key]), set(dict1[j_key])
        # auto detect: if the values have common element to do union
        if i_value & j_value:
            union_list = sorted(list(i_value | j_value))
            dict1[i_key] = union_list
            del dict1[j_key]

print dict1
#{'key3': ['6', '7', '8'], 'key1': ['1', '2', '3', '4', '5']}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.