3

Possible Duplicate:
Python: simple list merging based on intersections

I have a multiple list:

 list=[[1,2,3],[3,5,6],[8,9,10],[11,12,13]]

Is there a smart and fast way to get all the sublists with at least one intersection. In my example I want that the code return

 result=[[1,2,3,5,6],[8,9,10],[11,12,13]]
5
  • As the other question didn't have a good answer, I don't think this should be closed yet! Commented Feb 19, 2012 at 22:54
  • @katrielalex: On what basis are you saying that? "A better data structure could be of help", yes I agree on that, and you're free to add your solution too. Commented Feb 19, 2012 at 23:11
  • @Rik: looking through the comments, it seemed that all the answers were not-working in one way or another. I only read them briefly though! Commented Feb 19, 2012 at 23:16
  • @katrielalex: That's because we all encountered the same bug Sven Marnach seems to have. But we also fixed that, that's why there are a lot of comments. Basically the problem is that you may have two or more lists "linked" by a non-common element from another list. Commented Feb 19, 2012 at 23:21
  • @RikPoggi -- so you have. I can't retract the bounty; I'll pick which answer to award it to in a week! Commented Feb 19, 2012 at 23:27

3 Answers 3

0

This works, but maybe isn't very elegant:

def merge_lists(l):
        s=map(set, l)
        i, n=0, len(s)
        while i < n-1:
                for j in xrange(i+1, n):
                        if s[i].intersection(s[j]):
                                s[i].update(s[j])
                                del s[j]
                                n-=1
                                break
                else:
                        i+=1
        return [sorted(i) for i in s]
Sign up to request clarification or add additional context in comments.

2 Comments

Test your code with this data: [[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]] and check it against one of the solution in the duplicate question. The result should not be the same. This problem was already encountered there when the merging is not trivial.
Ok fixed. Not that it matters since the thread was closed anyways ...
0

Nice question! It's much simpler if you think of it as a connected-components problem in a graph. The following code uses the excellent networkx graph library and the pairs function from this question.

def pairs(lst):
    i = iter(lst)
    first = prev = item = i.next()
    for item in i:
        yield prev, item
        prev = item
    yield item, first

lists = [[1,2,3],[3,5,6],[8,9,10],[11,12,13]]

import networkx
g = networkx.Graph()
for sub_list in lists:
    for edge in pairs(sub_list):
            g.add_edge(*edge)

networkx.connected_components(g)
[[1, 2, 3, 5, 6], [8, 9, 10], [11, 12, 13]]

Explanation

We create a new (empty) graph g. For each sub-list in lists, consider its elements as nodes of the graph and add an edge between them. (Since we only care about connectedness, we don't need to add all the edges -- only adjacent ones!) Note that add_edge takes two objects, treats them as nodes (and adds them if they aren't already there), and adds an edge between them.

Then, we just find the connected components of the graph -- a solved problem! -- and output them as our intersecting sets.

3 Comments

I would suggest moving your answer to the other question.
I tested your answer with this data: [[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]] and your result seems to be missing a 16.
@Rik: nice point -- there was a bug in the pairs function I copied from the linked question! =p Fixed.
0

You can do this with essentially a single pass through the data:

list_of_lists = [[1, 2, 3], [3, 5, 6], [8, 9, 10], [11, 12, 13]]
sets = {}
for lst in list_of_lists:
    s = set(lst)
    t = set()
    for x in s:
        if x in sets:
            t.update(sets[x])
        else:
            sets[x] = s
    for y in t:
        sets[y] = s
    s.update(t)
ids = set()
for s in sets.itervalues():
    if id(s) not in ids:
        ids.add(id(s))
        print s

This creates a dictionary sets mapping each element to the set it belongs to. If some element has been seen before, its set is subsumed into the current one and all dictinary entries mapping to the former set are updated.

Finally we need to find all unique sets in the values of the dictionary sets. Note that while I implemented this as a dictionary of sets, the code also works with lists instead of sets.

18 Comments

I'm not entirely convinced by the ids hack -- are you sure it works in all cases? (I haven't quite grokked your algorithm yet.)
You can do the same thing (but slower, probably) with set(map(frozenset, sets.itervalues())).
Test your code with this data: [[65, 17, 5, 30, 79, 56, 48, 62], [6, 97, 32, 93, 55, 14, 70, 32], [75, 37, 83, 34, 9, 19, 14, 64], [43, 71], [], [89, 49, 1, 30, 28, 3, 63], [35, 21, 68, 94, 57, 94, 9, 3], [16], [29, 9, 97, 43], [17, 63, 24]] and check it against one of the solution in the duplicate question. The result should not be the same. This problem was already encountered there when the merging is not trivial.
@katrielalex: I don't see any problem with id() here, though I also thought about a different solution. Your suggestion would create a set of its own for every element, though, and compares them element-wise when inserting them into the set. Printing the result would be much slower than computing it...
@RikPoggi: I couldn't find any fundamental problem with my code. Your data contains a list wich contains 94 twice -- that was not covered by the answer, nor did it seem to be required by the question. (I made the trivial change to include that case.)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.