How to combine values composed of lists with common items in a dictionary using Python?

Question

I have a dictionary that resembles the following:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

I would like to merge all keys that have at least one common element and as a result. For example, the resulting dictionary should look like:

dict1 = {'key1':['1','2','3','4','5'],'key3':['6','7','8']}

Please note how key2 has been eliminated. Whether it is key1 or key2 that is eliminated does not matter. I have only gotten as far as being able to identify repeats, but not how to merge them in an iterative fashion. Thanks

why key1 and not key2? dicts have no order so what key comes first is not guaranteed — Padraic Cunningham
– Padraic Cunningham, Commented May 29, 2015 at 21:07
@Padraic, because they have a common item in their values ('3'), all items in key3 are unique to key3 so it remains separate — Vince
– Vince, Commented May 29, 2015 at 21:08
either key1 or key2 should be eliminated. I don't care which — Vince
– Vince, Commented May 29, 2015 at 21:10

Andrzej Pronobis · Accepted Answer · 2015-05-29 23:22:18Z

2

Would that work for you? Please note that since the order of elements in the dictionary is arbitrary, you cannot guarantee which keys will end up being inserted into the output dictionary.

dict_out = {}
processed = set()
for k1, v1 in dict_in.items():
    if k1 not in processed:
        processed.add(k1)
        vo = v1
        for k2, v2 in dict_in.items():
            if k2 not in processed and set(v1) & set(v2):
                vo = sorted(list(set(vo + v2)))
                processed.add(k2)
        dict_out[k1] = vo

This for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': ['6', '7', '8']}

And for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['7', '9']}

gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'6', '7', '8', '9'}}

And finally, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

it gives:

{'key1': {'1', '2', '3', '4', '5'}, 'key3': {'5', '6', '7', '8'}}

EDIT

OP requested that even outcomes of merges should be merged with each other. To achieve that, we can wrap the code above in a loop like this:

d = dict_in
processed = set([None])
while processed:
    dict_out = {}
    processed = set()
    for k1, v1 in d.items():
        if k1 not in processed:
            vo = v1
            for k2, v2 in d.items():
                if k1 is not k2 and set(vo) & set(v2):
                    vo = sorted(list(set(vo + v2)))
                    processed.add(k2)
            dict_out[k1] = vo
    d = dict_out

Then, for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['6', '7', '8'], 'key4': ['5', '6', '7']}

we get:

{'key4': ['1', '2', '3', '4', '5', '6', '7', '8']}

and for:

dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'],
           'key3': ['4', '6', '7'], 'key4': ['8', '9']}

we get:

{'key1': ['1', '2', '3', '4', '5', '6', '7'], 'key4': ['8', '9']}

edited May 29, 2015 at 23:22

answered May 29, 2015 at 21:22

Andrzej Pronobis

36.6k18 gold badges83 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Padraic Cunningham Over a year ago

try dict_in = {'key1': ['1', '2', '3'], 'key2': ['3', '4', '5'], 'key3': ['6', '7', '8'],'key4': ["9","10"]}

Andrzej Pronobis Over a year ago

Gives : {'key1': ['1', '2', '3', '4', '5'], 'key3': ['6', '7', '8'], 'key4': ['9', '10']}. Isn't that correct?

Vince Over a year ago

This is great for merging once, but a previously merged key/value pair cannot merge with others. For example: dict_in = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['4','6','7'],'key4':['8','9']} This gives: {'key3': set(['8', '3', '5', '4', '7']), 'key1': ['1', '2', '3'], 'key4': ['9', '10']} which is not right since key1 and key3 both have 3 in them. So close!

Andrzej Pronobis Over a year ago

That behavior was intentional :) I thought that this is what you actually need. Small edit will fix it.

Padraic Cunningham · Accepted Answer · 2015-05-29 22:07:52Z

1

If you want to change the original dict you will need to copy:

vals = {k: set(val) for k, val in dict1.items()}

for key, val in dict1.copy().items():
    for k, v in vals.copy().items():
        if k == key:
            continue
        if v.intersection(val):
            union = list(v.union(val))
            dict1[key] = union
            del vals[k]
            del dict1[k]

If you want to union all:

vals = {k: set(val) for k, val in dict1.items()}
unioned = set()
srt = sorted(dict1.keys())
srt2 = srt[:]
for key in srt:
    for k in srt2:
        if k == key:
            continue
        if vals[k].intersection(dict1[key]) and key not in unioned:
            unioned.add(k)
            dict1[key] = list(vals[k].union(dict1[key]))
            srt2.remove(k)

for k in unioned:
    del dict1[k]

edited May 29, 2015 at 22:07

answered May 29, 2015 at 21:34

Padraic Cunningham

181k30 gold badges264 silver badges327 bronze badges

2 Comments

Vince Over a year ago

I like this answer, works perfectly. Andrzej was first though... Thanks for your efforts and assistance.

Padraic Cunningham Over a year ago

@Vincem no worries, I was not totally sure of what should happen when you have updated values, I added another way that handles that

Burger King · Accepted Answer · 2015-05-30 06:42:37Z

I have a more compact method.

I think it's more readable and easy to understand. You can refer as below:

dict1 = {'key1':['1','2','3'],'key2':['3','4','5'],'key3':['6','7','8']}

# Index your key of dict
l = list(enumerate(sorted(dict1.keys())))

# nested loop
for i in xrange(len(dict1)):
    for j in xrange(i+1,len(dict1)):
        i_key, j_key = l[i][1], l[j][1]
        i_value, j_value = set(dict1[i_key]), set(dict1[j_key])
        # auto detect: if the values have common element to do union
        if i_value & j_value:
            union_list = sorted(list(i_value | j_value))
            dict1[i_key] = union_list
            del dict1[j_key]

print dict1
#{'key3': ['6', '7', '8'], 'key1': ['1', '2', '3', '4', '5']}

Collectives™ on Stack Overflow

How to combine values composed of lists with common items in a dictionary using Python?

3 Answers 3

4 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related