Fastest way to remove subsets of lists from a list in Python

Question

Suppose I have a list of lists like the one below (the actual list is much longer):

fruits = [['apple', 'pear'],
          ['apple', 'pear', 'banana'],
          ['banana', 'pear'],
          ['pear', 'pineapple'],
          ['apple', 'pear', 'banana', 'watermelon']]

In this case, all the items in the lists ['banana', 'pear'], ['apple', 'pear'] and ['apple', 'pear', 'banana'] are contained in the list ['apple', 'pear', 'banana', 'watermelon'] (the order of items does not matter), so I would like to remove ['banana', 'pear'], ['apple', 'pear'], and ['apple', 'pear', 'banana'] as they are subsets of ['apple', 'pear', 'banana', 'watermelon'].

My current solution is shown below. I first use ifilter and imap to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress and imap to drop them.

from itertools import imap, ifilter, compress

supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)


new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]

I wonder if there are more efficient ways to do this?

Possible duplicate of Python - verifying if one list is a subset of the other — Brent Washburne
– Brent Washburne, Commented Feb 4, 2016 at 18:47
You can start by droping imap and ifilter to use generator expressions/list comprehensions. They work the same way but produce readable code... — JBernardo
– JBernardo, Commented Feb 4, 2016 at 18:59
@BrentWashburne It's not exactly a duplicate. As you can see, my current solution does in fact use issubset() as suggested by the linked post. My question is more about how to remove lists that are subsets of other lists in a big list. — Alex
– Alex, Commented Feb 4, 2016 at 19:01
@dawg: Sorry, forgot to change the code. foo was supposed to be supersets. I updated it — Alex
– Alex, Commented Feb 4, 2016 at 19:32

lukaszzenko · Accepted Answer · 2016-02-04 20:02:56Z

7

filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

edited Feb 4, 2016 at 20:02

answered Feb 4, 2016 at 19:56

lukaszzenko

3174 silver badges11 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Alex Over a year ago

When I tried your code, I got

[['apple', 'pear'], [['apple', 'pear', 'banana'], ['banana', 'pear'], ['pear', 'pineapple'],  ['apple', 'pear', 'banana', 'watermelon']]

lukaszzenko Over a year ago

There was a mistake in my code. I think that current version should work.

Kyle Pittman Over a year ago

It worked for me either way you had it - and either way, this is such a Pythonic answer. It makes me happy.

Alex Over a year ago

Weird. In Canopy editor interface, I kept getting an empty list. But when I tried it in the command-line interface, I got the right results! Thanks for this

dawg · Accepted Answer · 2016-02-04 21:16:55Z

I don't know if it is faster but this is easier to read (to me anyway):

sets={frozenset(e) for e in fruits}  
us=set()
while sets:
    e=sets.pop()
    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
        continue
    else:
        us.add(e)

Update

It is fast. Faster still is to use a for loop. Check timings:

fruits = [['apple', 'pear'],
        ['apple', 'pear', 'banana'],
        ['banana', 'pear'],
        ['pear', 'pineapple'],
        ['apple', 'pear', 'banana', 'watermelon']]

from itertools import imap, ifilter, compress    

def f1():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    while sets:
        e=sets.pop()
        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
            continue
        else:
            us.append(list(e))   
    return us           

def f2():
    supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
    new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
    return new_list

def f3():
    return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

def f4():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    for e in sets:
        if any(e < s for s in sets):
            continue
        else:
            us.append(list(e))   
    return us              

if __name__=='__main__':
    import timeit     
    for f in (f1, f2, f3, f4):
        print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()

On my machine on Python 2.7:

f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]

silicon23 · Accepted Answer · 2022-08-09 23:41:18Z

0

Answer posted by @lukaszzenko is correct and works for Python 2.

For Python 3, it will give the object. The code below works on Python 3.

list (filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits) )

Related post in stackoverflow: Python list filtering: remove subsets from list of lists

You may also find other ways of doing it in the link below: Remove sublists that are present in another sublist

edited Aug 9, 2022 at 23:41

answered Aug 9, 2022 at 23:34

silicon23

882 silver badges9 bronze badges

Collectives™ on Stack Overflow

Fastest way to remove subsets of lists from a list in Python

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related