How to remove rows from csv based on matching data

Question

I have a large list of data in csv format which I need to remove rows from, based on matching two parameters.

My list of data to be removed would appear as follows:

London,James Smith
London,John Oliver
London,John-Smith-Harrison
Paris,Hermione
Paris,Trevor Wilson
New York City,Charlie Chaplin
New York City,Ned Stark
New York City,Thoma' Becket
New York City,Ryan-Dover

Then the main csv would remove a row based on matching the city name with the second column as well as matching the name with a name in the 9th column.

If both matched were achieved, delete the row in the main csv (note this csv hasn't been provided an example here).

Hi thanks for the answer, what could I have done to make it more clear. Obviously it is in my interest to make the problem at hand as clear as possible. Kind regards AEA — AEA
– AEA, Commented Sep 27, 2013 at 23:55
I just wasn't sure if you were struggling with something, or just wanted for somebody to just write the code for you (which I've done below :P). — Erik Kaplun
– Erik Kaplun, Commented Sep 27, 2013 at 23:58

Erik Kaplun · Accepted Answer · 2013-09-28 11:00:47Z

5

I verified the following to work as you need on the kind of data you provided/described:

import csv
from cStringIO import StringIO

# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
    filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}

out_f = StringIO()  # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')

# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
    for row in csv.reader(f, delimiter=','):
        if (row[1], row[8]) not in filters:
            out.writerow(row)

# for debugging only
print out_f.getvalue()  # prints the resulting filtered CSV data

NOTE: the {... for ... in ...} is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...) for it to work.

edited Sep 28, 2013 at 11:00

answered Sep 27, 2013 at 23:31

Erik Kaplun

38.5k15 gold badges102 silver badges113 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Mark Tolonen Over a year ago

Make sure to use 'rb' and 'wb' when opening files using the csv module. To quote the docs: it must be opened with the ‘b’ flag on platforms where that makes a difference. Use newline='' for Python 3.

kiriloff · Accepted Answer · 2013-09-28 01:44:41Z

1

You can read your data line by line and append line to list if its elements in 2nd and 9th columns are not in lists L1 and L2 respectively.

ext = "C:\Users\Me\Desktop\\test.txt"
readL = []

f = open(ext)

for line in f:
    listLine = line.strip().split(',')
    if(listLine[2] in L1 or listLine[9] in L2):
        continue
    readL += [listLine]


f.close()

answered Sep 28, 2013 at 1:44

kiriloff

26.5k40 gold badges163 silver badges235 bronze badges

1 Comment

Erik Kaplun Over a year ago

I believe he said if both row 2 as well as row 9 are found on the same line in the list of filters, skip the row; your code does something different; and it would be educational to also use idiomatic and nicely formatted Python in example snippets :) Also, the content of the ext variable is ill-formed because of backslashes; and your code doesn't show how to actually parse the contents of L1 and L2; and listLine[2] is the 3rd row but he said 2nd; and it should be readL.append(listLine) etc... definitely looks like you're just after cheap rep.

Collectives™ on Stack Overflow

How to remove rows from csv based on matching data

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related