1

I have a large list of data in csv format which I need to remove rows from, based on matching two parameters.

My list of data to be removed would appear as follows:

London,James Smith
London,John Oliver
London,John-Smith-Harrison
Paris,Hermione
Paris,Trevor Wilson
New York City,Charlie Chaplin
New York City,Ned Stark
New York City,Thoma' Becket
New York City,Ryan-Dover

Then the main csv would remove a row based on matching the city name with the second column as well as matching the name with a name in the 9th column.

If both matched were achieved, delete the row in the main csv (note this csv hasn't been provided an example here).

5
  • Might be useful to state the question more obviously. Commented Sep 27, 2013 at 23:27
  • Hi thanks for the answer, what could I have done to make it more clear. Obviously it is in my interest to make the problem at hand as clear as possible. Kind regards AEA Commented Sep 27, 2013 at 23:55
  • I just wasn't sure if you were struggling with something, or just wanted for somebody to just write the code for you (which I've done below :P). Commented Sep 27, 2013 at 23:58
  • Have you managed to test it by now? Commented Sep 29, 2013 at 11:12
  • Yep thanks, works and accepted :) Commented Sep 29, 2013 at 15:13

2 Answers 2

5

I verified the following to work as you need on the kind of data you provided/described:

import csv
from cStringIO import StringIO

# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
    filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}

out_f = StringIO()  # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')

# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
    for row in csv.reader(f, delimiter=','):
        if (row[1], row[8]) not in filters:
            out.writerow(row)

# for debugging only
print out_f.getvalue()  # prints the resulting filtered CSV data

NOTE: the {... for ... in ...} is set-comprehension syntax; depending on your Python version, you might need to change this to the equivalent set(... for ... in ...) for it to work.

Sign up to request clarification or add additional context in comments.

1 Comment

Make sure to use 'rb' and 'wb' when opening files using the csv module. To quote the docs: it must be opened with the ‘b’ flag on platforms where that makes a difference. Use newline='' for Python 3.
1

You can read your data line by line and append line to list if its elements in 2nd and 9th columns are not in lists L1 and L2 respectively.

ext = "C:\Users\Me\Desktop\\test.txt"
readL = []

f = open(ext)

for line in f:
    listLine = line.strip().split(',')
    if(listLine[2] in L1 or listLine[9] in L2):
        continue
    readL += [listLine]


f.close()

1 Comment

I believe he said if both row 2 as well as row 9 are found on the same line in the list of filters, skip the row; your code does something different; and it would be educational to also use idiomatic and nicely formatted Python in example snippets :) Also, the content of the ext variable is ill-formed because of backslashes; and your code doesn't show how to actually parse the contents of L1 and L2; and listLine[2] is the 3rd row but he said 2nd; and it should be readL.append(listLine) etc... definitely looks like you're just after cheap rep.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.