How do I read selected files from a remote Zip archive over HTTP using Python?

Question

I need to read selected files, matching on the file name, from a remote zip archive using Python. I don't want to save the full zip to a temporary file (it's not that large, so I can handle everything in memory).

I've already written the code and it works, and I'm answering this myself so I can search for it later. But since evidence suggests that I'm one of the dumber participants on Stackoverflow, I'm sure there's room for improvement.

Marcel Levy · Accepted Answer · 2009-01-10 00:03:25Z

10

Here's how I did it (grabbing all files ending in ".ranks"):

import urllib2, cStringIO, zipfile

try:
    remotezip = urllib2.urlopen(url)
    zipinmemory = cStringIO.StringIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib2.HTTPError:
    # handle exception

edited Jan 10, 2009 at 0:03

answered Sep 18, 2008 at 17:03

Marcel Levy

3,4371 gold badge32 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Jim Over a year ago

You want to replace the first line with: import urllib2, zipfile.

jfs Over a year ago

Why don't you use ZipFile(urllib2.urlopen(url))?

Marcel Levy Over a year ago

I tried that, but I couldn't get it to work because even though it was a file-like object, it didn't support a particular function that Zipfile needed. That's why I buffered it with cStringIO.

Ignacio Vazquez-Abrams Over a year ago

The directory for a zip file is stored at the end, therefore the entire file must be downloaded before extraction, whether into memory, or on disk.

retracile Over a year ago

It's not that hard to create your own file-like object to wrap the url so you don't have to download the whole thing: stackoverflow.com/questions/7829311/…

Tim Pietzcker · Accepted Answer · 2009-06-04 20:13:44Z

6

Thanks Marcel for your question and answer (I had the same problem in a different context and encountered the same difficulty with file-like objects not really being file-like)! Just as an update: For Python 3.0, your code needs to be modified slightly:

import urllib.request, io, zipfile

try:
    remotezip = urllib.request.urlopen(url)
    zipinmemory = io.BytesIO(remotezip.read())
    zip = zipfile.ZipFile(zipinmemory)
    for fn in zip.namelist():
        if fn.endswith(".ranks"):
            ranks_data = zip.read(fn)
            for line in ranks_data.split("\n"):
                # do something with each line
except urllib.request.HTTPError:
    # handle exception

answered Jun 4, 2009 at 20:13

Tim Pietzcker

337k59 gold badges520 silver badges572 bronze badges

Comments

Filipe Varela · Accepted Answer · 2013-01-22 14:43:27Z

5

This will do the job without downloading the entire zip file!

http://pypi.python.org/pypi/pyremotezip

answered Jan 22, 2013 at 14:43

Filipe Varela

511 silver badge2 bronze badges

1 Comment

mdaoust Over a year ago

Nice! Too bad this is py2 only.

Jim · Accepted Answer · 2008-09-18 17:07:38Z

1

Bear in mind that merely decompressing a ZIP file may result in a security vulnerability.

answered Sep 18, 2008 at 17:07

Jim

74.1k15 gold badges105 silver badges114 bronze badges

Collectives™ on Stack Overflow

How do I read selected files from a remote Zip archive over HTTP using Python?

4 Answers 4

5 Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

5 Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related