0

How do you remove similar items in a list in Python but only for a given item. Example,

l = list('need')

If 'e' is the given item then

l = list('nd')

The set() function will not do the trick since it will remove all duplicates.

count() and remove() is not efficient.

5
  • This can help stackoverflow.com/q/20950650/6692898 Commented Aug 16, 2020 at 0:31
  • 2
    You mean how to remove duplicate letters? The question is not clear enough, what dows "similar items" mean? Commented Aug 16, 2020 at 0:31
  • Please see example above. Commented Aug 16, 2020 at 0:33
  • Are you only wanting to remove a 'given item' if it is also a duplicate? What if the given removal letter is 'e' what would you expect with 'nedd'? Commented Aug 16, 2020 at 0:53
  • My question is clear enough, because I gave the input and what is the expected output. Commented Aug 16, 2020 at 1:39

4 Answers 4

2

use filter

assuming you write function that decide on the items that you want to keep in the list.

for your example

 def pred(x):
     return x!="e"
 l=list("need")
 l=list(filter(pred,l))
Sign up to request clarification or add additional context in comments.

Comments

0

Assuming given = 'e' and l= list('need').

for i in range(l.count(given)):
    l.remove(given)

3 Comments

count() will count the n number of occurrences of an item in your list, allowing you to loop through the list with remove() n times.
OP said count and remove are inefficient
that wasn't in the question when i first answered it lol
0

If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.

import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]

The output for j and k are :

j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]

If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx

re.subn('e','',''.join(lst))[1]

This will give you total number of items replaced in the list.

Comments

0

List comprehension Method. Not sure if the size/complexity is less than that of count and remove.

def scrub(l, given):
    return [i for i in l if i not in given]

Filter method, again i'm not sure

def filter_by(l, given):
    return list(filter(lambda x: x not in given, l))

Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp

def bruteforce(l, given):
    try:
        l.remove(given[0])
        return bruteforce(l, given)
    except ValueError:
        return bruteforce(l, given[1:])
    except IndexError:
        return l
    return l

For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!

Below is the method I've chosen to use.

def timer(func, name):
    print("-------{}-------".format(name))
    try:
        start = datetime.datetime.now()
        x = func()
        end = datetime.datetime.now()
        print((end-start).microseconds)
    except Exception, e:
        print("Failed: {}".format(e))
    print("\r")

The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.

l = list("need"*50000)
q = list("ne")
r = list("d"*50000)

For posterity I've added the count / remove method the OP was against. (For good reason!)

def count_remove(l, given):
    for i in given:
        for x in range(l.count(i)):
            l.remove(i)
    return l

All that's left to do is test!

timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)

timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)

timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)

timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)

And our results

-------List Comp-------
10000

-------Filter-------
28000

-------Count/Remove-------
199000

-------Bruteforce-------
Failed: maximum recursion depth exceeded

Process finished with exit code 0

The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.