Python merge multiple list with intersection [duplicate]

Question

Possible Duplicate:
Python: simple list merging based on intersections

I have a multiple list:

 list=[[1,2,3],[3,5,6],[8,9,10],[11,12,13]]

Is there a smart and fast way to get all the sublists with at least one intersection. In my example I want that the code return

 result=[[1,2,3,5,6],[8,9,10],[11,12,13]]

As the other question didn't have a good answer, I don't think this should be closed yet! — Katriel
– Katriel, Commented Feb 19, 2012 at 22:54
@katrielalex: On what basis are you saying that? "A better data structure could be of help", yes I agree on that, and you're free to add your solution too. — Rik Poggi
– Rik Poggi, Commented Feb 19, 2012 at 23:11
@Rik: looking through the comments, it seemed that all the answers were not-working in one way or another. I only read them briefly though! — Katriel
– Katriel, Commented Feb 19, 2012 at 23:16
@katrielalex: That's because we all encountered the same bug Sven Marnach seems to have. But we also fixed that, that's why there are a lot of comments. Basically the problem is that you may have two or more lists "linked" by a non-common element from another list. — Rik Poggi
– Rik Poggi, Commented Feb 19, 2012 at 23:21
@RikPoggi -- so you have. I can't retract the bounty; I'll pick which answer to award it to in a week! — Katriel
– Katriel, Commented Feb 19, 2012 at 23:27

hochl · Accepted Answer · 2012-02-20 00:10:56Z

0

This works, but maybe isn't very elegant:

def merge_lists(l):
        s=map(set, l)
        i, n=0, len(s)
        while i < n-1:
                for j in xrange(i+1, n):
                        if s[i].intersection(s[j]):
                                s[i].update(s[j])
                                del s[j]
                                n-=1
                                break
                else:
                        i+=1
        return [sorted(i) for i in s]

edited Feb 20, 2012 at 0:10

answered Feb 19, 2012 at 22:34

hochl

13.1k10 gold badges58 silver badges92 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Rik Poggi Over a year ago

Test your code with this data:

[[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]]

and check it against one of the solution in the duplicate question. The result should not be the same. This problem was already encountered there when the merging is not trivial.

hochl Over a year ago

Ok fixed. Not that it matters since the thread was closed anyways ...

Community · Accepted Answer · 2017-05-23 12:03:33Z

0

Nice question! It's much simpler if you think of it as a connected-components problem in a graph. The following code uses the excellent networkx graph library and the pairs function from this question.

def pairs(lst):
    i = iter(lst)
    first = prev = item = i.next()
    for item in i:
        yield prev, item
        prev = item
    yield item, first

lists = [[1,2,3],[3,5,6],[8,9,10],[11,12,13]]

import networkx
g = networkx.Graph()
for sub_list in lists:
    for edge in pairs(sub_list):
            g.add_edge(*edge)

networkx.connected_components(g)
[[1, 2, 3, 5, 6], [8, 9, 10], [11, 12, 13]]

Explanation

We create a new (empty) graph g. For each sub-list in lists, consider its elements as nodes of the graph and add an edge between them. (Since we only care about connectedness, we don't need to add all the edges -- only adjacent ones!) Note that add_edge takes two objects, treats them as nodes (and adds them if they aren't already there), and adds an edge between them.

Then, we just find the connected components of the graph -- a solved problem! -- and output them as our intersecting sets.

edited May 23, 2017 at 12:03

CommunityBot

11 silver badge

answered Feb 19, 2012 at 22:41

Katriel

124k19 gold badges141 silver badges172 bronze badges

3 Comments

agf Over a year ago

I would suggest moving your answer to the other question.

Rik Poggi Over a year ago

I tested your answer with this data:

[[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]]

and your result seems to be missing a 16.

Katriel Over a year ago

@Rik: nice point -- there was a bug in the pairs function I copied from the linked question! =p Fixed.

Sven Marnach · Accepted Answer · 2012-02-22 00:09:44Z

0

You can do this with essentially a single pass through the data:

list_of_lists = [[1, 2, 3], [3, 5, 6], [8, 9, 10], [11, 12, 13]]
sets = {}
for lst in list_of_lists:
    s = set(lst)
    t = set()
    for x in s:
        if x in sets:
            t.update(sets[x])
        else:
            sets[x] = s
    for y in t:
        sets[y] = s
    s.update(t)
ids = set()
for s in sets.itervalues():
    if id(s) not in ids:
        ids.add(id(s))
        print s

This creates a dictionary sets mapping each element to the set it belongs to. If some element has been seen before, its set is subsumed into the current one and all dictinary entries mapping to the former set are updated.

Finally we need to find all unique sets in the values of the dictionary sets. Note that while I implemented this as a dictionary of sets, the code also works with lists instead of sets.

edited Feb 22, 2012 at 0:09

answered Feb 19, 2012 at 22:35

Sven Marnach

608k123 gold badges968 silver badges865 bronze badges

18 Comments

Katriel Over a year ago

I'm not entirely convinced by the ids hack -- are you sure it works in all cases? (I haven't quite grokked your algorithm yet.)

Katriel Over a year ago

You can do the same thing (but slower, probably) with set(map(frozenset, sets.itervalues())).

Rik Poggi Over a year ago

Test your code with this data:

[[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]]

and check it against one of the solution in the duplicate question. The result should not be the same. This problem was already encountered there when the merging is not trivial.

Sven Marnach Over a year ago

@katrielalex: I don't see any problem with id() here, though I also thought about a different solution. Your suggestion would create a set of its own for every element, though, and compares them element-wise when inserting them into the set. Printing the result would be much slower than computing it...

Sven Marnach Over a year ago

@RikPoggi: I couldn't find any fundamental problem with my code. Your data contains a list wich contains 94 twice -- that was not covered by the answer, nor did it seem to be required by the question. (I made the trivial change to include that case.)

|

Collectives™ on Stack Overflow

Python merge multiple list with intersection [duplicate]

3 Answers 3

2 Comments

Explanation

3 Comments

18 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Explanation

3 Comments

18 Comments

Linked

Related