0

Before anyone marks this as duplicate, I have tried everything from isspace, startswith, itertools filterfunction, readlines()[2:]. I have a Python script that searches hundreds of CSV files and prints the row with the matching string (in this case a unique ID) in the eighth column from the left.

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(csvfiles))
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row

The code works with test .csv files. However, the real .csv files I'm working with all have blank first two rows. And I am getting this error message.

IndexError: list index out of range

Here's my latest code:

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(csvfiles))
    for row in reader:
        if not row:
            continue
        col8 = str(row[8])
        if col8 == '36862210':
            print row
7
  • You might want to use row.strip() == '' to test an empty line rather than not row. Commented Aug 26, 2015 at 0:39
  • can you paste the full stacktrace? Commented Aug 26, 2015 at 0:59
  • Do you want to skip the first two rows, regardless of their content? Or do you want to skip all empty rows, wherever they appear? Commented Aug 26, 2015 at 1:01
  • Just the first two rows...the batch of .csv just happens to have no data in the first two rows. Thanks. Commented Aug 26, 2015 at 2:16
  • When I use if row.strip() == ' ' the error message reads AttributeError: 'list' object has no attribute 'strip' Commented Aug 26, 2015 at 2:19

2 Answers 2

3

Try to skip the first two row using next instead:

import csv
import glob

csvfiles = glob.glob('20??-??-??.csv')
for filename in csvfiles:
    reader = csv.reader(open(filename))
    next(reader)
    next(reader)
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row
Sign up to request clarification or add additional context in comments.

1 Comment

I'm getting an error message. reader = csv.reader(open(csvfiles)). TypeError: coercing to Unicode: need string or buffer, list found
0

A csv reader takes an iterable, which can be a file object but need not be.

You can create a generator that removes all blank lines from a file like so:

csvfile = open(filename)
filtered_csv = (line for line in csvfile if not line.isspace())

This filtered_csv generator will lazily pull one line at a time from your file object, and skip to the next one if the line is entirely whitespace.

You should be able to write your code like:

for filename in csvfiles:
    csvfile = open(filename)
    filtered_csv = (line for line in csvfile if not line.isspace())
    reader = csv.reader(filtered_csv)
    for row in reader:
        col8 = str(row[8])
        if col8 == '36862210':
            print row

Assuming the non-blank rows are well formed, ie, all have an 8th index, you should not get an IndexError.

EDIT: If you're still encountering an IndexError it probably is not because of a line consisting of only whitespace. Catch the exception and look at the row:

try:
    col8 = str(row[8])
    if col8 == '36862210':
        print row
except IndexError:
    pass

to examine the output from the CSV reader that's actually causing the error. If the row is an object that doesn't print its contents, do instead print list(row).

1 Comment

My original script worked even if there were blank rows in the body of the .csv file. However, having the two blank rows at the top seems to be the problem. When I tried your script, I get this error. col8 = str(row[8]) IndexError: list index out of range

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.