Process File in memory using python

Question

I am reading some data files stored as excel from online. My current process involves downloading the file to disk using the retrieve function defined below which uses the urllib2 library and then parses the excel document using the traverseWorkbook function. The traverse function uses the xlrd library for parsing the excel.

I would like to perform the same operation without requiring downloading the file to disk but will prefer to keep the file in memory and parse it memory.

Not sure how to even proceed, but I'm sure its possible.

def retrieveFile(url, filename):
    try:
        req = urllib2.urlopen(url)
        CHUNK = 16 * 1024
        with open(filename, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                if not chunk: break
                    fp.write(chunk)
        return True
    except Exception, e:
        return None


def traverseWorkbook(filename):
    values = []

    wb = open_workbook(filename)
    for s in wb.sheets():
        for row in range(s.nrows):
           if row > 10:
               rowData = processRow(s, row, type)
               if rowData:
                   values.append(rowData)

esorton · Accepted Answer · 2014-04-30 02:51:16Z

1

You can read the entire file into memory using:

data = urllib2.urlopen(url).read()

Once the file is in memory, you can load it into xlrd using the file_contents argument of open_workbook:

wb = xlrd.open_workbook(url, file_contents=data)

Pass the url in as the filename as the documentation states it might be used in messages; otherwise, it will be ignored.

Thus, your traverseWorbook method can be rewritten as:

def traverseWorkbook(url):
    values = []
    data = urllib2.urlopen(url).read()
    wb = xlrd.open_workbook(filename, file_contents=data)
    for s in wb.sheets():
        for row in range(s.nrows):
        if row > 10:
            rowData = processRow(s, row, type)
            if rowData:
                values.append(rowData)
    return values

answered Apr 30, 2014 at 2:51

esorton

1,58210 silver badges14 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

dano · Accepted Answer · 2014-04-30 02:37:39Z

0

You could use the StringIO library and write the downloaded data to a file-like StringIO object, rather than a normal file.

import cStringIO as cs
from contextlib import closing

def retrieveFile(url, filename):
    try:
        req = urllib2.urlopen(url)
        CHUNK = 16 * 1024
        full_str = None
        with closing(cs.StringIO()) as fp:
            while True:
                chunk = req.read(CHUNK)
                if not chunk: break
                    fp.write(chunk)
            full_str = fp.getvalue()  # This contains the full contents of the downloaded file.
        return True
    except Exception, e:
        return None

answered Apr 30, 2014 at 2:37

dano

95.5k21 gold badges234 silver badges231 bronze badges

Comments

alfonso · Accepted Answer · 2018-05-18 19:44:28Z

0

You can use pandas for this. The benefits are that it's optimized to handle working with data in memory since the computation is done in C and not actually Python. It also abstracts away a lot of the messy details that come with downloading the data.

import pandas as pd

xl = pd.ExcelFile(url, engine='xlrd')
sheets = xl.sheet_names

# work with the first sheet, or iterate through sheets if there are more than one.
df = xl.parse(sheets[0])

# The file is now a dataframe.
# You can manipulate the data in memory using the Pandas API
# ...
# ...

# after massaging the data, write to to an xls file:
out_file = '~/Documents/out_file.xls'
data.to_excel(out_file, encoding='utf-8', index=False)

edited May 18, 2018 at 19:44

answered May 18, 2018 at 19:37

alfonso

8841 gold badge18 silver badges33 bronze badges

Collectives™ on Stack Overflow

Process File in memory using python

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related