3

Suppose I have a list of lists like the one below (the actual list is much longer):

fruits = [['apple', 'pear'],
          ['apple', 'pear', 'banana'],
          ['banana', 'pear'],
          ['pear', 'pineapple'],
          ['apple', 'pear', 'banana', 'watermelon']]

In this case, all the items in the lists ['banana', 'pear'], ['apple', 'pear'] and ['apple', 'pear', 'banana'] are contained in the list ['apple', 'pear', 'banana', 'watermelon'] (the order of items does not matter), so I would like to remove ['banana', 'pear'], ['apple', 'pear'], and ['apple', 'pear', 'banana'] as they are subsets of ['apple', 'pear', 'banana', 'watermelon'].

My current solution is shown below. I first use ifilter and imap to create a generator for the supersets that each list might have. Then for those cases that do have supersets, I use compress and imap to drop them.

from itertools import imap, ifilter, compress

supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)


new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
new_list
#[['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]

I wonder if there are more efficient ways to do this?

8
  • 1
    Possible duplicate of Python - verifying if one list is a subset of the other Commented Feb 4, 2016 at 18:47
  • 1
    You can start by droping imap and ifilter to use generator expressions/list comprehensions. They work the same way but produce readable code... Commented Feb 4, 2016 at 18:59
  • @BrentWashburne It's not exactly a duplicate. As you can see, my current solution does in fact use issubset() as suggested by the linked post. My question is more about how to remove lists that are subsets of other lists in a big list. Commented Feb 4, 2016 at 19:01
  • @JBernardo: Could you please give an example? Thank you! :) Commented Feb 4, 2016 at 19:02
  • @dawg: Sorry, forgot to change the code. foo was supposed to be supersets. I updated it Commented Feb 4, 2016 at 19:32

3 Answers 3

7
filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)
Sign up to request clarification or add additional context in comments.

4 Comments

When I tried your code, I got [['apple', 'pear'], [['apple', 'pear', 'banana'], ['banana', 'pear'], ['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
There was a mistake in my code. I think that current version should work.
It worked for me either way you had it - and either way, this is such a Pythonic answer. It makes me happy.
Weird. In Canopy editor interface, I kept getting an empty list. But when I tried it in the command-line interface, I got the right results! Thanks for this
4

I don't know if it is faster but this is easier to read (to me anyway):

sets={frozenset(e) for e in fruits}  
us=set()
while sets:
    e=sets.pop()
    if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
        continue
    else:
        us.add(e)   

Update

It is fast. Faster still is to use a for loop. Check timings:

fruits = [['apple', 'pear'],
        ['apple', 'pear', 'banana'],
        ['banana', 'pear'],
        ['pear', 'pineapple'],
        ['apple', 'pear', 'banana', 'watermelon']]

from itertools import imap, ifilter, compress    

def f1():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    while sets:
        e=sets.pop()
        if any(e.issubset(s) for s in sets) or any(e.issubset(s) for s in us):
            continue
        else:
            us.append(list(e))   
    return us           

def f2():
    supersets = imap(lambda a: list(ifilter(lambda x: len(a) < len(x) and set(a).issubset(x), fruits)), fruits)
    new_list = list(compress(fruits, imap(lambda x: 0 if x else 1, supersets)))
    return new_list

def f3():
    return filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits)

def f4():              
    sets={frozenset(e) for e in fruits}  
    us=[]
    for e in sets:
        if any(e < s for s in sets):
            continue
        else:
            us.append(list(e))   
    return us              

if __name__=='__main__':
    import timeit     
    for f in (f1, f2, f3, f4):
        print f.__name__, timeit.timeit("f()", setup="from __main__ import f, fruits"), f()  

On my machine on Python 2.7:

f1 8.09958791733 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]
f2 15.5085151196 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f3 11.9473619461 [['pear', 'pineapple'], ['apple', 'pear', 'banana', 'watermelon']]
f4 5.87942910194 [['watermelon', 'pear', 'apple', 'banana'], ['pear', 'pineapple']]

1 Comment

I tried f1(), and I got set() as the result..mm
0

Answer posted by @lukaszzenko is correct and works for Python 2.

For Python 3, it will give the object. The code below works on Python 3.

list (filter(lambda f: not any(set(f) < set(g) for g in fruits), fruits) )

Related post in stackoverflow: Python list filtering: remove subsets from list of lists

You may also find other ways of doing it in the link below: Remove sublists that are present in another sublist

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.