0
import csv

reader=csv.reader(open('Names_Duplicates.csv', 'r'),delimiter=',')
writer=csv.writer(open('Names_NoDuplicates.csv', 'w'),delimiter=',')

Names=set()
for row in reader:
    if row[0] not in Names:
        writer.writerow(row)
        Names.add(row[0])

I am using this code to remove duplicates from a CSV file using Python 2.7(Windows). I am able to remove duplicates based on one column at a time. Is there anyway i can remove duplicates from multiple coloumn's at the same time?

Any help is appreciated.

P.S -- Pandas library is not working in my system.

1
  • 1
    Just a notice: you should open a csv output file in "wb" mode on windows if you use Python 2 to avoid lines ending in \r\r\n Commented Nov 17, 2015 at 9:30

1 Answer 1

2

Use a tuple of multiple items as the key.

import operator
 ...
fieldmatches = set()
fieldspec = operator.itemgetter(0, 2, 3) # for example
for row in reader:
  if fieldspec(row) not in fieldmatches:
    writer.writerow(row)
    fieldmatches.add(fieldspec(row))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.