13

I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory).

I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.

4 Answers 4

10

Here's how I did it (grabbing all files ending in ".ranks"):

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception
Sign up to request clarification or add additional context in comments.

5 Comments

You want to replace the first line with: import urllib2, zipfile.
Why don't you use ZipFile(urllib2.urlopen(url))?
I tried that, but I couldn't get it to work because even though it was a file-like object, it didn't support a particular function that Zipfile needed. That's why I buffered it with cStringIO.
The directory for a zip file is stored at the end, therefore the entire file must be downloaded before extraction, whether into memory, or on disk.
It's not that hard to create your own file-like object to wrap the url so you don't have to download the whole thing: stackoverflow.com/questions/7829311/…
6

Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! Just as an update: For Python 3.0, your code needs to be modified slightly:

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception

Comments

5

This will do the job without downloading the entire zip file!

http://pypi.python.org/pypi/pyremotezip

1 Comment

Nice! Too bad this is py2 only.
1

Bear in mind that merely decompressing a ZIP file may result in a security vulnerability.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.