1

As per title, given n sets like the following:

set1 = {"aa","ab","ba","cc","ca"},
set2 = {"fr","fa","po","pl","cc"},
set3 = {"fr","xz","hn","fa"},
set4 = {"jq","we","hn","ca","aa","fk"},
set5 = {"jp","wx","we","fr","ba"}

I want to get the exclusively intersections between them, conceptually like a Venn Diagram representation. So, for example:


The intersection between the set2 and set3, despite sharing {"fr","fa"}, will only be {"fa"}, as the string "fr" is also present in the intersection between set2,set3 and set5.


The brilliant solution proposed in this answer managed correctly all the permutations of given sets, but didn't manage to account for exclusivity, considering each overlap independently. So, in the previous example, it'll return {"fr","fa"}

Tryout

I tried to handle this situation in a very slow way, and want to know if there's a more proficient way to do the job. By creating a list of sets

set_list = [set1, set2, set3, set4, set5]

I start from finding the whole sets intersection

whole_intersect = set.intersections(*map(set,set_list))

Then I'm moving in every permutation of length n-1 using list indices

isect_1_2_3_4 = set.intersections(*map(set, map(set_list.__getitem__, [0,1,2,3])) - whole_intersect
isect_1_2_3_5 = set.intersections(*map(set, map(set_list.__getitem__, [0,1,2,5])) - whole_intersect
..

Then by n-2

isect_1_2_3 = set.intersections(*map(set, map(set_list.__getitem__, [0,1,2])) - whole_intersect - isect_1_2_3_4 - isect_1_2_3_5
..

Till filling intersections of 2 sets

isect_1_2 = set.intersections(*map(set, map(set_list.__getitem__, [0,1,2])) - whole_intersect - isect_1_2_3_4 - isect_1_2_3_5 - isect_1_2_3 - isect_1_2_4 - isect_1_2_5
..

As expectable, this approach is a true pain. How could I manage to do it in a less handcrafted and pythonic way?

1 Answer 1

1

I think I have done it by first making a dictionary of all of the intersections, of every combination, and then finding the unique intersection by taking the intersection and subtracting the union of every other intersection:

import itertools

set1 = {"aa","ab","ba","cc","ca"}
set2 = {"fr","fa","po","pl","cc"}
set3 = {"fr","xz","hn","fa"}
set4 = {"jq","we","hn","ca","aa","fk"}
set5 = {"jp","wx","we","fr","ba"}
sets = {
    1: set1,
    2: set2,
    3: set3,
    4: set4,
    5: set5
}

intersections = {}
for n_combinations in range(2, len(sets) + 1):
    tmp = list(map(dict, itertools.combinations(sets.items(), n_combinations)))
    tmp = {tuple(x.keys()):set.intersection(*list(x.values())) for x in tmp}
    intersections.update(tmp)

unique_in_intersection = {}
for n_combinations in range(2, len(sets)+1):
    for lookup_set in itertools.combinations(range(1, len(sets)+1), n_combinations):
        s1_intersection_s2 = intersections[lookup_set]
        union_other_intersections = set.union(*[v for k, v in intersections.items() if k != lookup_set and len(k) > len(lookup_set)])
        unique_in_intersection[lookup_set] = s1_intersection_s2 - union_other_intersections

result:

{
    (1, 2): {'cc'},
    (1, 3): set(),
    (1, 4): {'ca', 'aa'},
    (1, 5): {'ba'},
    (2, 3): {'fa'},
    (2, 4): set(),
    (2, 5): set(),
    (3, 4): {'hn'},
    (3, 5): set(),
    (4, 5): {'we'},
    (1, 2, 3): set(),
    (1, 2, 4): set(),
    (1, 2, 5): set(),
    (1, 3, 4): set(),
    (1, 3, 5): set(),
    (1, 4, 5): set(),
    (2, 3, 4): set(),
    (2, 3, 5): {'fr'},
    (2, 4, 5): set(),
    (3, 4, 5): set(),
    (1, 2, 3, 4): set(),
    (1, 2, 3, 5): set(),
    (1, 2, 4, 5): set(),
    (1, 3, 4, 5): set(),
    (2, 3, 4, 5): set(),
    (1, 2, 3, 4, 5): set()
}
Sign up to request clarification or add additional context in comments.

4 Comments

Fine, thanks for the provided example. I was doing it manually to keep an hierarchy, where a string will be present only in the largest (in term of intersected sets) intersection, to preserve the Venn Diagram logic (see in detail the Tryout section).
@Shred I am not sure what you mean by your comment. Hopefully, my answer generalizes to any number of sets, however.
Sorry, the question isn't clear at all. From the toy example, the intersection 2_3_5 must contain the string "fr", as none of the intersections of higher length containing all these three sets (so 1_2_3_4_5, 2_3_4_5, 1_2_3_5) has the same string inside. So, given an intersection of n sets, each string has to be checked for exclusivity only against intersections of j sets, where j > n, and where each set of the n-long intersection is inside the j-long one. Hope it sounds clearer now.
@Shed I modified my answer, and now gets the correct result.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.