For loop function call file parsing

Question

I recognize that this code is wildly inefficient.

I'm at a complete loss here, and I'm planning to remove the function and just make the code procedural in main. But I'm hoping someone can explain what I'm seeing here. The loop in main() runs and calls matchName(). matchName() executes it's loop then, when it should return for the next "vtRow", instead it just stops executing. So the output is the first record of vtData and every record from adData.

import csv, re

def main():
    #1st word
    oneWord = re.compile( '\A([\w]+)' )
    #1st 3
    first3 = re.compile( '\A([\w]{3})' )
    #last 3
    last3 = re.compile( '(?=([\w]{3})$)' )

    mArray = [ oneWord, first3, last3 ]
    adFile =  open('adData.csv', 'rb')
    adFields = ('lName','fName','cNum','addy','city','state','zip','phone','sex')
    adData = csv.reader(adFile, dialect='excel')

    vtFile =  open('data360.csv','rb')
    vtFields = ('ref','fName','lName')
    vtData = csv.reader(vtFile, dialect='excel')

    for vtRow in vtData:
        matchName(vtRow, adData, mArray) # appears that this runs once and exits

def matchName(curVtRow, adData, mArr):
    lName = curVtRow[4].lower()
    fName = curVtRow[3].lower()
    Posib = []

    for row in adData:
        cName = row[0].lower() 
        print "vt " + lName + " ; ad " + cName
    return 1

if __name__ == "__main__":
    main()

What line does it stop at? Could you mark it in the code? Show input and output? — Marcin
– Marcin, Commented Jun 28, 2013 at 15:41

JAB · Accepted Answer · 2013-06-28 15:44:16Z

2

The issue is that looping with adData causes adFile to be read, and so after the first call to matchName() the file will have been read all the way and thus adData won't be looped over as adData.next() won't result in anything (and thus the print statement will not be executed). I suggest placing adFile.seek(0) after the call to matchName(). Note that just recreating adData won't work; I discovered recently that a csv reader updates its underlying object's file position rather than keeping track of it on its own.

answered Jun 28, 2013 at 15:44

JAB

21.2k6 gold badges73 silver badges80 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Brian Over a year ago

Thank you! I'll have to do a little more reading in the docs, but I really appreciate your help.

Simon Callan Over a year ago

Is doing a seek(0) on the underlying file for a CSV reader guaranteed to work? Once the CSV reader reaches the end of file, I would half expect it to stop permanently

JAB Over a year ago

@SimonCallan Yes, because adData.next() (next(adData) in Python 3) just wraps a call to adFile.next() (or rather, Reader_iternext() since csv wraps a C extension) with some additional processing to parse the comma-separated values. hg.python.org/cpython/file/9046ef201591/Modules/_csv.c#l800. That hasn't changed since the csv module was introduced 10 years ago: hg.python.org/cpython/file/cc35ed2b26a8/Modules/_csv.c#l626 (or if it did change, they reverted the change at some point.)

Brian Over a year ago

changing the loop to: matchName(vtRow, adData, mArray)\n adFile.seek(0) executes, and is currently creating a 600Million line text file... ^C

JAB Over a year ago

(Well, I suppose the values aren't necessarily comma-separated, depending on the dialect, but in my opinion a file like that shouldn't really be called a CSV file. It's like storing a JPEG in a file with a PNG extension.)

|

Collectives™ on Stack Overflow

For loop function call file parsing

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related