Python removing overlap of lists

Question

I'm aware variations of this question have been asked already, but none of the ones I've been able to find have addressed my specific aim.

I am trying to take two lists in Python with string elements and remove the overlapping sections of the two. For example:

list1 = ["25","+","7","*","6","/","7"]
list2 = ["7","*","6"]

Should go to

["25","+","/","7"]

I've considered a list comprehension along the lines of

[i for i in list1 if not in list2]

but this would yield

["25","+","/"]

as both instances of "7" would be taken out.

How can I achieve what I'm trying to do here? Thanks.

Edit - this was marked as a possible duplicate. In my example with the list comprehension, I already explained how it is a different problem to the one linked.

Possible duplicate of How to remove every occurrence of sub-list from list — blhsing
– blhsing, Commented Aug 7, 2018 at 2:39
@blhsing It's not - the question you linked deals with every occurrence while I showed with my example with the list comprehension that that is not what I wanted. — muke
– muke, Commented Aug 7, 2018 at 2:46
This is similar to finding a substring in a larger string. I would suggest you to read about KMP (Knuth-Morris-Pratt) algorithm, it can directly be applied to your scenario. — Vikash Kesarwani
– Vikash Kesarwani, Commented Aug 7, 2018 at 2:56

juanpa.arrivillaga · Accepted Answer · 2018-08-07 02:46:03Z

6

Essentially, you want a difference operation on a multi-set, i.e. a bag. Python implements this for the collections.Counter object:

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

So, for example:

>>> list1 = ["25","+","7","*","6","/","7"]
>>> list2 = ["7","*","6"]
>>> list((Counter(list1) - Counter(list2)).elements())
['25', '+', '7', '/']

In Python 3.6+ this will be ordered (although this is not currently guaranteed, and should probably be considered an implementation detail). If order is important, and you are not using this version, you may have to implement an ordered counter.

Indeed, the docs themselves provide just such a recipe:

>>> from collections import Counter, OrderedDict
>>> class OrderedCounter(Counter, OrderedDict):
...     'Counter that remembers the order elements are first encountered'
...     def __repr__(self):
...         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
...     def __reduce__(self):
...         return self.__class__, (OrderedDict(self),)
...
>>> list((OrderedCounter(list1) - OrderedCounter(list2)).elements())
['25', '+', '/', '7']

edited Aug 7, 2018 at 2:46

answered Aug 7, 2018 at 2:39

juanpa.arrivillaga

97.6k14 gold badges141 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

blhsing Over a year ago

This solution actually does not maintain order (even with OrderedCounter). Try list1 = ["6","/","7","6","7","6"] and list2 = ["7","6","7"]. The output is ['6', '6', '/'] when it should be ['6', '/', '6'].

juanpa.arrivillaga Over a year ago

@blhsing I get ['/','6','6'], i.e. maintaining order of list1, however, you are right, what "maintaining order" here is ambiguous. Not sure exactly what OP wants, and they haven't commented on that regard, but I see how "overlap" would imply that.

blhsing Over a year ago

My understanding is that the OP wants to emulate string replacement with lists, so it's like '6/7676'.replace('767', ''), where the result is '6/6', so to speak, which is why I said the expected output in this case really should be ['6', '/', '6'].

juanpa.arrivillaga Over a year ago

@blhsing right, I understand what you are saying, but I think that it is ambiguous in that regard. In any case, if the elements will always be strings, then you'd be probably hard-pressed to beat list(''.join(list1).replace(''.join(list2), ''))

blhsing Over a year ago

Yes, it's slightly ambiguous. But in this case a simple string replacement won't do because in the OP's example there is a string in the list that is more than one character long.

|

Tai · Accepted Answer · 2018-08-07 02:56:04Z

3

Using remove method. Probably slow. O(n^2) in worse case.

list.remove(x)

Remove the first item from the list whose value is x. 
It is an error if there is no such item.

for i in list2:
    list1.remove(i)

# list1 becomes
['25', '+', '/', '7']

edited Aug 7, 2018 at 2:56

answered Aug 7, 2018 at 2:51

Tai

8,0643 gold badges31 silver badges50 bronze badges

6 Comments

juanpa.arrivillaga Over a year ago

I believe this will work well, although depending on the lists, it could potentially perform rather poorly.

Tai Over a year ago

@juanpa.arrivillaga agree. Added a comment.

juanpa.arrivillaga Over a year ago

I see this performing well if the pairs are small (better than the dict approach I'd wager). If you are working with two large lists, then the worst-case quadratic time will hit you.

Tai Over a year ago

@juanpa.arrivillaga thanks for letting us know the results of your experiments.

blhsing Over a year ago

This solution actually does not maintain order. Try list1 = ["6","/","7","6","7","6"] and list2 = ["7","6","7"]. The output is ['/', '6', '6'] when it should be ['6', '/', '6'].

|

Collectives™ on Stack Overflow

Python removing overlap of lists

2 Answers 2

6 Comments

6 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

6 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related