0

Let's say we have a list of N lists. For example:

L = [['A','B','C','D','E'], ['A','B','C'],['B','C','D'],['C','D'],['A','C','D']]

I want to find the longest common subsets that occur in this list and the corresponding counts. In this case:

ans = {'A,B,C':2, 'A,C,D':2, 'B,C,D':2}

I think this question is similar to mine, but I am having a hard time understanding the C# code.

1
  • 1
    If you have a very large number of lists with many elements, you might want to look at this related question, which talks about parallelization and other optimizations for this problem. Commented Feb 17, 2022 at 17:54

1 Answer 1

1

I assume that a "common subset" is a set that is a subset of at least two lists in the array.

With that in mind, here's one solution.

from itertools import combinations
from collections import Counter
L = [['A','B','C','D','E'], ['A','B','C'],['B','C','D'],['C','D'],['A','C','D']]

L = [*map(frozenset,L)]
sets = [l1&l2 for l1,l2 in combinations(L,2)]
maxlen = max(len(s) for s in sets)
sets = [s for s in sets if len(s) == maxlen]
count = Counter(s for s in sets for l in L if s <= l)
dic = {','.join(s):k for s,k in count.items()}

Resulting dictionary dic:

{'A,B,C': 2, 'B,C,D': 2, 'A,C,D': 2}
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you! Would this scale if I have ~200K lists, each of length 100? making all the combinations explicitly might pose an issue, right? And do you mind commenting why you used frozenset?
Making all the subsets is O(n^2); I'm not sure what that means concretely for ~200K lists. I would say just try it and see if it's taking too long. One optimization in that initial step is to keep track of the max-length subset so that we can skip combinations involving elements like ['C','D'] that are too short to consider
Regarding frozensets: sets aren't hashable, so you can't use them as keys for dictionary objects like a Counter.
PS: Regarding that "optimization", keeping track of the max-length really just amounts to filtering out any elements of L smaller than the second-to-largest element.
You should precompute frozenset(l1) for each l1. Right now, that step takes O(m*L^2) time, where L is the number of lists and m is the max size of a list.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.