List from CSV column in Python

Question

So I had written a little snip for Scrapy to search the country on a site by zip code, but it seems like a waste to go through all the nonexistent zip codes, so, first, this is what I had...

def start_requests(self):
       for i in xrange(100000):
           yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % i)

The idea is obvious, but I downloaded a CSV with all of the US zip codes in a column - how would I easily use this as a list (or more efficient method than a list) to use in the above example? I have pandas if that would make things easier.

Brandon · Accepted Answer · 2013-12-26 23:05:47Z

1

If I'm understanding you correctly, you have a file that is comma-delimited and formatted such that in a particular column (Perhaps titled 'ZipCodes') a zipcode is present on each row.

If there's a header line and different columns and you know the name of the column that contains the zipcodes you could do this:

def start_requests(self, filename, columnname):
    with open(filename) as file:
        headers = file.readline().strip().split(',')
        for line in file.readlines():
            zipcode = line.strip().split(',')[headers.index(columnname)]
            yield self.make_requests_from_url("http://www.example.com/zipcode/%05d/search.php" % zipcode)

edited Dec 26, 2013 at 23:05

answered Dec 26, 2013 at 22:43

Brandon

1,9541 gold badge23 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

dg99 · Accepted Answer · 2013-12-26 22:44:03Z

1

Open file, read lines, grab zip codes, yield ...

for line in open('zipcodes.csv', 'r').readlines():
    zipcode = line.split(',')[columnNumberOfTheZipCodesStartingFrom0]
    yield self.make_requests_from_url("http://foo.com/blah/%s/search.php" % (zipcode,))

answered Dec 26, 2013 at 22:44

dg99

5,7363 gold badges40 silver badges51 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 11:49:25Z

Just to round out the array of perfectly good suggestions, here's another. The main idea to this approach is that it doesn't require special libraries like pandas, but isn't just reading plain file contents either, in which case you have to re-invent the wheel as far as CSV markup goes (not the hardest thing, but why bother?). If your csv file is simple enough, it might be easier just to read out the file contents, as suggested by dg99

Use python's built-in csv library!

ziplist = []
import csv
with open('zipcodes.csv', 'rb') as csvfile:
    zipreader = csv.reader(csvfile)
    for row in zipreader:
        ziplist.append(row[i])

Notes:

I have row[i] where i is the column index for the zipcodes in your csv file. If the file lists zip+4 codes, you might use row[i][:5]. Interestingly, if you don't know what column number the zipcodes will be in, but you do know the column header (field name), you can use

zipreader = csv.DictReader(csvfile) for zipDict in zipreader: ziplist.append(row['Zip Code Column Name Here'])
According to this post, getting info back out of a list is just as efficient as a tuple, so this seems like the way to go.

LPH · Accepted Answer · 2013-12-26 21:56:31Z

0

so you want to read in a csv to a list....well: i think this should be easy:

import pandas
colname = ['zip code','city']
zipdata = pandas.read_csv('uszipcodes.csv', names=colname)

i hope i understood you right!

answered Dec 26, 2013 at 21:56

LPH

1,2959 silver badges16 bronze badges

2 Comments

Xodarap777 Over a year ago

How would I take that data and put it into my yield line above? I need it to just put in the int.

dg99 Over a year ago

pandas is quite a heavyweight import for just reading a column of text. :)

dstromberg · Accepted Answer · 2013-12-26 22:57:19Z

0

Maybe like this?

#!/usr/local/cpython-3.3/bin/python

import csv
import pprint

def gen_zipcodes(file_):
    reader = csv.reader(file_, delimiter='|', quotechar='"')
    for row in reader:
        yield row[0]

def main():
    with open('zipcodes_2006.txt', 'r') as file_:
        zipcodes = list(gen_zipcodes(file_))
    pprint.pprint(zipcodes[:10])

main()

answered Dec 26, 2013 at 22:57

dstromberg

7,2432 gold badges31 silver badges30 bronze badges

Collectives™ on Stack Overflow

List from CSV column in Python

5 Answers 5

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related