1

Sorry, very much a beginner with Python and could really use some help.

I have a large CSV file, items separated by commas, that I'm trying to go through with Python. Here is an example of a line in the CSV.

123123,JOHN SMITH,SMITH FARMS,A,N,N,12345 123 AVE,CITY,NE,68355,US,12345 123 AVE,CITY,NE,68355,US,(123) 555-5555,(321) 555-5555,[email protected],15-JUL-16,11111,2013,22-DEC-93,NE,2,1\par

I'd like my code to scan each line and look at only the 9th item (the state). For every item that matches my query, I'd like that entire line to be written to an CSV.

The problem I have is that my code will find every occurrence of my query throughout the entire line, instead of just the 9th item. For example, if I scan looking for "NE", it will write the above line in my CSV, but also one that contains the string "NEARY ROAD."

Sorry if my terminology is off, again, I'm a beginner. Any help would be greatly appreciated.

I've listed my coding below:

import csv

with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
    reader = csv.reader(f, delimiter=',')
    writer = csv.writer(outf)
    for line in f:
        if "NE" in line:
             print ('Found: []'.format(line))
             writer.writerow([line])

2 Answers 2

3

You're not actually using your reader to read the input CSV, you're just reading the raw lines from the file itself.

A fixed version looks like the following (untested):

import csv

with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
    reader = csv.reader(f, delimiter=',')
    writer = csv.writer(outf)
    for row in reader:
        if row[8] == 'NE':
             print ('Found: {}'.format(row))
             writer.writerow(row)

The changes are as follows:

  • Instead of iterating over the input file's lines, we iterate over the rows parsed by the reader (each of which is a list of each of the values in the row).
  • We check to see if the 9th item in the row (i.e. row[8]) is equal to "NE".
  • If so, we output that row to the output file by passing it in, as-is, to the writer's writerow method.
  • I also fixed a typo in your print statement - the format method uses braces (not square brackets) to mark replacement locations.
Sign up to request clarification or add additional context in comments.

1 Comment

@J.Folkens: not a problem!
0

This snippet should solves your problem

import csv

with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
    reader = csv.reader(f, delimiter=',')
    writer = csv.writer(outf)
    for row in reader:
        if "NE" in row:
            print ('Found: {}'.format(row))
            writer.writerow(row)

if "NE" in line in your code is trying to find out whether "NE" is a substring of string line, which works not as intended. The lines are raw lines of your input file.

If you use if "NE" in row: where row is parsed line of your input file, you are doing exact element matching.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.