2

I'm aware variations of this question have been asked already, but none of the ones I've been able to find have addressed my specific aim.

I am trying to take two lists in Python with string elements and remove the overlapping sections of the two. For example:

list1 = ["25","+","7","*","6","/","7"]
list2 = ["7","*","6"]

Should go to

["25","+","/","7"]

I've considered a list comprehension along the lines of

[i for i in list1 if not in list2]

but this would yield

["25","+","/"]

as both instances of "7" would be taken out.

How can I achieve what I'm trying to do here? Thanks.

Edit - this was marked as a possible duplicate. In my example with the list comprehension, I already explained how it is a different problem to the one linked.

7
  • you can try use a counter Commented Aug 7, 2018 at 2:35
  • is the order important? Commented Aug 7, 2018 at 2:36
  • 2
    Possible duplicate of How to remove every occurrence of sub-list from list Commented Aug 7, 2018 at 2:39
  • @blhsing It's not - the question you linked deals with every occurrence while I showed with my example with the list comprehension that that is not what I wanted. Commented Aug 7, 2018 at 2:46
  • This is similar to finding a substring in a larger string. I would suggest you to read about KMP (Knuth-Morris-Pratt) algorithm, it can directly be applied to your scenario. Commented Aug 7, 2018 at 2:56

2 Answers 2

6

Essentially, you want a difference operation on a multi-set, i.e. a bag. Python implements this for the collections.Counter object:

Several mathematical operations are provided for combining Counter objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less.

So, for example:

>>> list1 = ["25","+","7","*","6","/","7"]
>>> list2 = ["7","*","6"]
>>> list((Counter(list1) - Counter(list2)).elements())
['25', '+', '7', '/']

In Python 3.6+ this will be ordered (although this is not currently guaranteed, and should probably be considered an implementation detail). If order is important, and you are not using this version, you may have to implement an ordered counter.

Indeed, the docs themselves provide just such a recipe:

>>> from collections import Counter, OrderedDict
>>> class OrderedCounter(Counter, OrderedDict):
...     'Counter that remembers the order elements are first encountered'
...     def __repr__(self):
...         return '%s(%r)' % (self.__class__.__name__, OrderedDict(self))
...     def __reduce__(self):
...         return self.__class__, (OrderedDict(self),)
...
>>> list((OrderedCounter(list1) - OrderedCounter(list2)).elements())
['25', '+', '/', '7']
Sign up to request clarification or add additional context in comments.

6 Comments

This solution actually does not maintain order (even with OrderedCounter). Try list1 = ["6","/","7","6","7","6"] and list2 = ["7","6","7"]. The output is ['6', '6', '/'] when it should be ['6', '/', '6'].
@blhsing I get ['/','6','6'], i.e. maintaining order of list1, however, you are right, what "maintaining order" here is ambiguous. Not sure exactly what OP wants, and they haven't commented on that regard, but I see how "overlap" would imply that.
My understanding is that the OP wants to emulate string replacement with lists, so it's like '6/7676'.replace('767', ''), where the result is '6/6', so to speak, which is why I said the expected output in this case really should be ['6', '/', '6'].
@blhsing right, I understand what you are saying, but I think that it is ambiguous in that regard. In any case, if the elements will always be strings, then you'd be probably hard-pressed to beat list(''.join(list1).replace(''.join(list2), ''))
Yes, it's slightly ambiguous. But in this case a simple string replacement won't do because in the OP's example there is a string in the list that is more than one character long.
|
3

Using remove method. Probably slow. O(n^2) in worse case.

list.remove(x)

Remove the first item from the list whose value is x. 
It is an error if there is no such item.
for i in list2:
    list1.remove(i)

# list1 becomes
['25', '+', '/', '7']

6 Comments

I believe this will work well, although depending on the lists, it could potentially perform rather poorly.
@juanpa.arrivillaga agree. Added a comment.
I see this performing well if the pairs are small (better than the dict approach I'd wager). If you are working with two large lists, then the worst-case quadratic time will hit you.
@juanpa.arrivillaga thanks for letting us know the results of your experiments.
This solution actually does not maintain order. Try list1 = ["6","/","7","6","7","6"] and list2 = ["7","6","7"]. The output is ['/', '6', '6'] when it should be ['6', '/', '6'].
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.