Manipulating a CSV file in Python

Question

import csv

reader=csv.reader(open('Names_Duplicates.csv', 'r'),delimiter=',')
writer=csv.writer(open('Names_NoDuplicates.csv', 'w'),delimiter=',')

Names=set()
for row in reader:
    if row[0] not in Names:
        writer.writerow(row)
        Names.add(row[0])

I am using this code to remove duplicates from a CSV file using Python 2.7(Windows). I am able to remove duplicates based on one column at a time. Is there anyway i can remove duplicates from multiple coloumn's at the same time?

Any help is appreciated.

P.S -- Pandas library is not working in my system.

Just a notice: you should open a csv output file in "wb" mode on windows if you use Python 2 to avoid lines ending in \r\r\n — Serge Ballesta
– Serge Ballesta, Commented Nov 17, 2015 at 9:30

Ignacio Vazquez-Abrams · Accepted Answer · 2015-11-17 09:29:12Z

2

Use a tuple of multiple items as the key.

import operator
 ...
fieldmatches = set()
fieldspec = operator.itemgetter(0, 2, 3) # for example
for row in reader:
  if fieldspec(row) not in fieldmatches:
    writer.writerow(row)
    fieldmatches.add(fieldspec(row))

answered Nov 17, 2015 at 9:29

Ignacio Vazquez-Abrams

804k160 gold badges1.4k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Manipulating a CSV file in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related