I am reading some data files stored as excel from online. My current process involves downloading the file to disk using the retrieve function defined below which uses the urllib2 library and then parses the excel document using the traverseWorkbook function. The traverse function uses the xlrd library for parsing the excel.
I would like to perform the same operation without requiring downloading the file to disk but will prefer to keep the file in memory and parse it memory.
Not sure how to even proceed, but I'm sure its possible.
def retrieveFile(url, filename):
try:
req = urllib2.urlopen(url)
CHUNK = 16 * 1024
with open(filename, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
if not chunk: break
fp.write(chunk)
return True
except Exception, e:
return None
def traverseWorkbook(filename):
values = []
wb = open_workbook(filename)
for s in wb.sheets():
for row in range(s.nrows):
if row > 10:
rowData = processRow(s, row, type)
if rowData:
values.append(rowData)