0

What I want to do is to remove duplicates from the list and every time duplicate is removed insert an empty item.

I have code for removing duplicates. It also ignores empty list items

import csv

#Create new output file

new_file = open('addr_list_corrected.csv','w')
new_file.close()

with open('addr_list.csv', 'r') as addr_list:
    csv_reader = csv.reader(addr_list, delimiter=',')
    for row in csv_reader:

        print row
        print "##########################"
        seen=set()
        seen_add=seen.add
        #empty cell/element evaluates to false

        new_row = [ cell for cell in row if not (cell and cell in seen or seen_add(cell))]
        print new_row

        with open('addr_list_corrected.csv', 'a') as addr_list_corrected:
            csv_writer=csv.writer(addr_list_corrected, delimiter=',')
            csv_writer.writerow(new_row)

But I need to replace every removed item with an empty string.

4
  • possible duplicate of How do you remove duplicates from a list in Python whilst preserving order? Commented Mar 23, 2015 at 2:48
  • I've voted to close this a dupe. The duped answer doesn't "insert empty items", but it's a trivial modification to do so. Commented Mar 23, 2015 at 2:49
  • Take a look at the unique_everseen function in the Itertools Recipes. Commented Mar 23, 2015 at 4:00
  • @Anonymous yes it is probably a trivial modification but do not seem to be able to do it ;) Commented Mar 23, 2015 at 15:47

4 Answers 4

3

I would do that with an iterator. Something like this:

def dedup(seq):
    seen = set()
    for v in seq:
        yield '' if v in seen else v
        seen.add(v)
Sign up to request clarification or add additional context in comments.

2 Comments

you seem to use for loop would list comprehension be any better?
Better in what sense? Faster? Probably.
0

Edit: reverse the logic to make the meaning clearer:

Another alternative would be to do something like this:

seen = dict()
seen_setdefault = seen.setdefault
new_row = ["" if cell in seen else seen_setdefault(cell, cell) for cell in row]

To give an example:

>>> row = ["to", "be", "or", "not", "to", "be"]
>>> seen = dict()
>>> seen_setdefault = seen.setdefault
>>> new_row = ["" if cell in seen else seen_setdefault(cell, cell) for cell in row]
>>> new_row
['to', 'be', 'or', 'not', '', '']

Edit 2: Out of curiosity I ran a quick test to see which approach was fastest:

>>> from random import randint
>>> from statistics import mean
>>> from timeit import repeat
>>>
>>> def standard(seq):
...     """Trivial modification to standard method for removing duplicates."""
...     seen = set()
...     seen_add = seen.add
...     return ["" if x in seen or seen_add(x) else x for x in seq]
...
>>> def dedup(seq):
...     seen = set()
...     for v in seq:
...         yield '' if v in seen else v
...         seen.add(v)
...
>>> def pedro(seq):
...     """Pedro's iterator based approach to removing duplicates."""
...     my_dedup = dedup
...     return [x for x in my_dedup(seq)]
...
>>> def srgerg(seq):
...     """Srgerg's dict based approach to removing duplicates."""
...     seen = dict()
...     seen_setdefault = seen.setdefault
...     return ["" if cell in seen else seen_setdefault(cell, cell) for cell in seq]
...
>>> data = [randint(0, 10000) for x in range(100000)]
>>>
>>> mean(repeat("standard(data)", "from __main__ import data, standard", number=100))
1.2130275770426708
>>> mean(repeat("pedro(data)", "from __main__ import data, pedro", number=100))
3.1519048346103555
>>> mean(repeat("srgerg(data)", "from __main__ import data, srgerg", number=100))
1.2611971098676882

As can be seen from the results, making a relatively simple modification to the standard approach described in this other stack-overflow question is fastest.

1 Comment

Hi Guys! Thanks a lot. I am new to python did not quite understand comprehensions. Trivial modification did the trick: seen=set() seen_add=seen.add new_row = ["" if x in seen or seen_add(x) else x for x in row] @srgerg Thanks a lot!
0

You can use a set to keep track of seen items. Using the example list used above:

x = ['to', 'be', 'or', 'not', 'to', 'be']
seen = set()
for index, item in enumerate(x):
    if item in seen:
        x[index] = ''
    else:
        seen.add(item)
print x

Comments

0

You can create a new List and append the element if it is not present in the new List else append None if the element is already present in the new List.

oldList = [3, 1, 'a', 2, 4, 2, 'a', 5, 1, 3]
newList = []

for i in oldList:
    if i in newList:
        newList.append(None)
    else:
        newList.append(i)
print newList

Output:

[3, 1, 'a', 2, 4, None, None, 5, None, None]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.