Suppose I have a list of lists like the one below (the actual list is much longer):
fruits = [['apple', 'pear'],
['apple', 'pear', 'banana'],
['banana', 'pear'],
['pear', 'pineapple'],
['apple', 'pear', 'banana', 'watermelon']]
In this case, all the items in the lists ['banana', 'pear'], ['apple', 'pear'] and ['apple', 'pear', 'banana'] are contained in the list ['apple', 'pear', 'banana', 'watermelon'] (the order of items does not matter), so I would like to remove ['banana', 'pear'], ['apple', 'pear'], and ['apple', 'pear', 'banana'] as they are subsets of ['apple', 'pear', 'banana', 'watermelon'].
My current solution is shown below. I first use ifilter and imap to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress and imap to drop them.
from itertools import imap, ifilter, compress
supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
I wonder if there are more efficient ways to do this?
issubset()as suggested by the linked post. My question is more about how to remove lists that are subsets of other lists in a big list.foowas supposed to besupersets. I updated it