2

So I had written a little snip for Scrapy to search the country on a site by zip code, but it seems like a waste to go through all the nonexistent zip codes, so, first, this is what I had...

def start_requests(self):
       for i in xrange(100000):
           yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % i)

The idea is obvious, but I downloaded a CSV with all of the US zip codes in a column - how would I easily use this as a list (or more efficient method than a list) to use in the above example? I have pandas if that would make things easier.

5 Answers 5

1

If I'm understanding you correctly, you have a file that is comma-delimited and formatted such that in a particular column (Perhaps titled 'ZipCodes') a zipcode is present on each row.

If there's a header line and different columns and you know the name of the column that contains the zipcodes you could do this:

def start_requests(self, filename, columnname):
    with open(filename) as file:
        headers = file.readline().strip().split(',')
        for line in file.readlines():
            zipcode = line.strip().split(',')[headers.index(columnname)]
            yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % zipcode)
Sign up to request clarification or add additional context in comments.

Comments

1

Open file, read lines, grab zip codes, yield ...

for line in open('zipcodes.csv', 'r').readlines():
    zipcode = line.split(',')[columnNumberOfTheZipCodesStartingFrom0]
    yield self.make_requests_from_url("http://foo.com/blah/%s/search.php" % (zipcode,))

Comments

1

Just to round out the array of perfectly good suggestions, here's another. The main idea to this approach is that it doesn't require special libraries like pandas, but isn't just reading plain file contents either, in which case you have to re-invent the wheel as far as CSV markup goes (not the hardest thing, but why bother?). If your csv file is simple enough, it might be easier just to read out the file contents, as suggested by dg99

Use python's built-in csv library!

ziplist = []
import csv
with open('zipcodes.csv', 'rb') as csvfile:
    zipreader = csv.reader(csvfile)
    for row in zipreader:
        ziplist.append(row[i])

Notes:

  • I have row[i] where i is the column index for the zipcodes in your csv file. If the file lists zip+4 codes, you might use row[i][:5]. Interestingly, if you don't know what column number the zipcodes will be in, but you do know the column header (field name), you can use

    zipreader = csv.DictReader(csvfile)
    for zipDict in zipreader:
    ziplist.append(row['Zip Code Column Name Here'])

  • According to this post, getting info back out of a list is just as efficient as a tuple, so this seems like the way to go.

Comments

0

so you want to read in a csv to a list....well: i think this should be easy:

import pandas
colname = ['zip code','city']
zipdata = pandas.read_csv('uszipcodes.csv', names=colname)

i hope i understood you right!

2 Comments

How would I take that data and put it into my yield line above? I need it to just put in the int.
pandas is quite a heavyweight import for just reading a column of text. :)
0

Maybe like this?

#!/usr/local/cpython-3.3/bin/python

import csv
import pprint

def gen_zipcodes(file_):
    reader = csv.reader(file_, delimiter='|', quotechar='"')
    for row in reader:
        yield row[0]

def main():
    with open('zipcodes_2006.txt', 'r') as file_:
        zipcodes = list(gen_zipcodes(file_))
    pprint.pprint(zipcodes[:10])

main()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.