7

Suppose I have a list L of unknown objects, O1 to On, and I want to remove another object reference M which may refer to one of the objects in L, I've managed to do it using:

L = [ O1, O2, ... On]

...

L = [ j for j in L if j not in [ M ] ]

which is lovely and idiomatic... but I'm having to do it a lot, and I wonder if there's not another more idiomatic way, or if there is not a faster way.

The important point is that the list of objects is unknown, and may or may not include the object to be excluded. I want to avoid extending or subclassing the objects where possible.

6
  • 1
    List comprehension will be the fastest way (list comprehensions are generally faster in Python + it will be with O(n) complexity). On a side note, if M is a single object why would you want to do j not in [M]? j == M will definitely be a little faster as direct comparisons are always faster. Commented Apr 29, 2016 at 7:24
  • try:L.remove(M) except ValueError:pass? However the remove method only removes the first element equal to M. Commented Apr 29, 2016 at 7:24
  • can there be more than one instance of M in the list? Commented Apr 29, 2016 at 7:28
  • The problem here is that if your objects are unknown, you don't know whether they are hashable. This is why you can't use a set and end up with quadratic runtime. In practice, having a list of objects with completely unknown properties seems weird. Where and why are you getting this list? To me this looks like a problem upstream, which needs to be fixed upstream if you want to avoid quadratic runtime for the filtering. Commented Apr 29, 2016 at 7:51
  • I agree @timgeb - but sadly, I'm just they guy downstream from the sewage-works... Commented Apr 29, 2016 at 7:57

5 Answers 5

5

list.remove seems to be the fastest way, with list comprehension as the second fastest and filter at last.

Here are the timeit results

In: python -m timeit '[x for x in [1,2,3,4,5] if x not in [4]]'
Out: 1000000 loops, best of 3: 0.285 usec per loop

In: python -m timeit '[x for x in [1,2,3,4,5] if x != 4]'
Out: 1000000 loops, best of 3: 0.254 usec per loop

In: python -m timeit 'filter(lambda x: x not in [4], [1,2,3,4,5])'
Out: 1000000 loops, best of 3: 0.577 usec per loop

In: python -m timeit 'filter(lambda x: x != 4, [1,2,3,4,5])'
Out: 1000000 loops, best of 3: 0.569 usec per loop

In: python -m timeit '[1,2,3,4,5].remove(4)'
Out: 10000000 loops, best of 3: 0.132 usec per loop
Sign up to request clarification or add additional context in comments.

3 Comments

Your results for remove are too optimistic, because you should add a try except to catch a possible ValueError. But I agree with you it's still the best way - provided OP want to change the original list because other methods builds a copy..
Agreed about the Try. Thanks @Muhammad
@SergeBallesta you are right, in case of try except + no exception it was still faster than list comprehension but for try except + raised exception it was slower than list comprehension.
2

What about the built in filter function?

>>> l = [1,2,3,4,5]
>>> f = [4]
>>> filter(lambda x: x not in f, l)
[1, 2, 3, 5]

or in python3

>>> list(filter(lambda x: x not in f, l))
[1, 2, 3, 5]

2 Comments

Note: this does not return a list in python3+.
added the python3 version
2

Use try/except wrapped in a recursive function recursion takes care of potential multiple M's

def tremove(L, M):
    try:
        L.remove(M)
        return tremove(L, M)
    except:
        return L

tremove(L, M)

Comments

2

If there are multiple occurences of your value then probably a while-loop with remove is needed:

L = [1,2,3,4,5]
while True:
    try:
        L.remove(4)
    except:
        break

It is a bit slower (due to the exception handling and multiple iterations over the list) than the list comprehension:

[ j for j in L if j != 4 ]

but both do work fine. If you want to exclude multiple values then you should use the list-comprehension:

M = [1, 4]
[ j for j in L if j not in M ]

because the try / except will be nested and the list comprehension only needs to traverse the list once.

Comments

2

Here's an idea that let's you make O(1) containment checks for hashable items. It should be dramatically faster for long lists M with lots of hashables.

class MixedBag(object):
    def __init__(self, *args):
        self.hashed = set()
        self.nothashed = []

        for x in args:
            self.add(x)

    def add(self, x):
        try:
            self.hashed.add(x)
        except TypeError:
            self.nothashed.append(x)

    def __contains__(self, x):
        try:
            return x in self.hashed
        except TypeError:
            return x in self.nothashed


L = [[1,2,3], 4, '5', {6}]
M = [[1,2,3], '5', {4}]

mix = MixedBag(*M)
L = [x for x in L if x not in mix]
print(L) # [4, set([6])]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.