1

I recognize that this code is wildly inefficient.

I'm at a complete loss here, and I'm planning to remove the function and just make the code procedural in main. But I'm hoping someone can explain what I'm seeing here. The loop in main() runs and calls matchName(). matchName() executes it's loop then, when it should return for the next "vtRow", instead it just stops executing. So the output is the first record of vtData and every record from adData.

import csv, re

def main():
    #1st word
    oneWord = re.compile( '\A([\w]+)' )
    #1st 3
    first3 = re.compile( '\A([\w]{3})' )
    #last 3
    last3 = re.compile( '(?=([\w]{3})$)' )

    mArray = [ oneWord, first3, last3 ]
    adFile =  open('adData.csv', 'rb')
    adFields = ('lName','fName','cNum','addy','city','state','zip','phone','sex')
    adData = csv.reader(adFile, dialect='excel')

    vtFile =  open('data360.csv','rb')
    vtFields = ('ref','fName','lName')
    vtData = csv.reader(vtFile, dialect='excel')

    for vtRow in vtData:
        matchName(vtRow, adData, mArray) # appears that this runs once and exits

def matchName(curVtRow, adData, mArr):
    lName = curVtRow[4].lower()
    fName = curVtRow[3].lower()
    Posib = []

    for row in adData:
        cName = row[0].lower() 
        print "vt " + lName + " ; ad " + cName
    return 1

if __name__ == "__main__":
    main()
2
  • What line does it stop at? Could you mark it in the code? Show input and output? Commented Jun 28, 2013 at 15:41
  • it exits gracefully at matchname(vtRow, adData, mArray) Commented Jun 28, 2013 at 15:46

1 Answer 1

2

The issue is that looping with adData causes adFile to be read, and so after the first call to matchName() the file will have been read all the way and thus adData won't be looped over as adData.next() won't result in anything (and thus the print statement will not be executed). I suggest placing adFile.seek(0) after the call to matchName(). Note that just recreating adData won't work; I discovered recently that a csv reader updates its underlying object's file position rather than keeping track of it on its own.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you! I'll have to do a little more reading in the docs, but I really appreciate your help.
Is doing a seek(0) on the underlying file for a CSV reader guaranteed to work? Once the CSV reader reaches the end of file, I would half expect it to stop permanently
@SimonCallan Yes, because adData.next() (next(adData) in Python 3) just wraps a call to adFile.next() (or rather, Reader_iternext() since csv wraps a C extension) with some additional processing to parse the comma-separated values. hg.python.org/cpython/file/9046ef201591/Modules/_csv.c#l800. That hasn't changed since the csv module was introduced 10 years ago: hg.python.org/cpython/file/cc35ed2b26a8/Modules/_csv.c#l626 (or if it did change, they reverted the change at some point.)
changing the loop to: matchName(vtRow, adData, mArray)\n adFile.seek(0) executes, and is currently creating a 600Million line text file... ^C
(Well, I suppose the values aren't necessarily comma-separated, depending on the dialect, but in my opinion a file like that shouldn't really be called a CSV file. It's like storing a JPEG in a file with a PNG extension.)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.