1

The format of my import CSV fetched using urllib2 and put into folders are like so:

number,season,episode,production code,airdate,title,special?,tvrage
1,1,1,"101",24/Sep/07,"Pilot",n,"http://www.tvrage.com/Chuck/episodes/579282"

Now I am successfully converting that into SQL statments as well as another CSV file that can be inserted into my database. Into a format like so:

,1,1,1,"Pilot",'2006-10-11',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1

Using the following code

csv = """,%s,%s,%s,%s,%r,,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1""" % (showid, line[1],line[2], line[5], date(line[4]))
    print>>final, csv

EDIT -

I have changed from string formatting to this:

csv = ','+showid+','+line[1]+','+line[2]+','+line[5]+','+date(line[4])+',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'

Its not much better, and I am still having trouble with some files being skipped on the parse. Not sure if its me or the CSV module.

Problem is its going through some files perfectly fine. Some CSV files it just skips, and for some I just get errors like IndexError: list index out of range

If anyone has experience with CSV files and getting them to parse correctly I would really appreciate the help.

Here is the Full Source Code: http://cl.ly/2W472g303D1p0J3S2o46

dsimport.py - http://pastie.org/3076663 CSVFileHandler.py - http://pastie.org/3076667

Thanks

4
  • 4
    Have you looked at the csv module? If so, why aren't you using it? Commented Dec 26, 2011 at 20:14
  • Yep I am using the CSV module, just having some strange anomalies Commented Dec 26, 2011 at 20:47
  • You're obviously not using the csv module if you're pasting a csv "string" together manually. Or attempting to. Commented Dec 26, 2011 at 21:10
  • I have included the source code, it is just 2 files im using. I am using the CSV module, that is just for taking the imports and changing some values around. Commented Dec 26, 2011 at 21:15

2 Answers 2

1

I'm not sure exactly what are all the errors, but here are a few tips:

  1. processFile(line), line is a bit of a bad name as it isn't a string line, it's a row or list of elements. That's what confused Tim and me as well at first sight.
  2. You should verify that line has at least 6 elements as your script requires.
  3. You can use the join method which is awesome.

Here's a small refactoring:

def processFile(row):
    if len(row) < 6:
        #raise Exception('too few columns')
        # maybe it's better to just ignore bad rows in your case
        return
    items = [
        '',
        showid,
        row[1],
        row[2],
        row[5],
        date(row[4]),
        ]
    res = ','.join(items)
    res += ',,,,,1,2011-12-23 15:52:49,2011-12-23 15:52:49,1,1'
    print res
    print>>final, res

handler = CSVFileHandler('/Users/tharshan/WebRoot/stv/export/csv/%s-save.csv' % name)
try:
    handler.process(processFile, name)    
except Exception, e:
    print 'Failed processing and skipping %s because of: %s' % (name, e)

final.close()
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you, those are some very nice tips for error catching. However, I am still running into the problem of where a CSV file has been downloaded and should be parsed and printed to the new file however its just being skipped. The lines from that file are not being printed to the console. I am just stumped, I got no idea why.
Also it does not make any sense that these CSV files would have different col numbers because they are all exported in the same format from this site. Example: epguides.com/common/exportToCSV.asp?rage=20720
If exceptions are thrown, then please do list them. If files are just "skipped" then maybe you should try and print every time you enter main() whatever filename should be handled and see if that function is indeed called as many times as you planned. If main() is called enough times then. hmmm. I just noticed something. You really should close that file final every time you finish appending to it (at the end of main).
Thanks, ive added the close at the end just for good measure. I am unsure of how to check if the main is called enough times, does you mean just by a simple counter perhaps? and see if it matched the array size? Ive done what you suggested, and have a look at these print outs: Very strange indeed, I dont get how some of these rows are possible.I think the parser is messing up somehow pastie.org/3076993
It seems a lot of csv files are skipped because somewhere in them they had too few columns. Maybe you should just skip the rows that have too few columns. I updated my solution for that.
|
0

Nevermind all fixed. In the end I just used the excel dialect, and did the output csv with pipe lines. Either way it was quite fiddly and honestly feel like i got it to work with sheer luck.

Thanks for all the help.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.