36

The goal is to download a file from the internet, and create from it a file object, or a file like object without ever having it touch the hard drive. This is just for my knowledge, wanting to know if its possible or practical, particularly because I would like to see if I can circumvent having to code a file deletion line.

This is how I would normally download something from the web, and map it to memory:

import requests
import mmap

u = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip")

with open("channel.zip", "wb") as f: # I want to eliminate this, as this writes to disk
    f.write(u.content)

with open("channel.zip", "r+b") as f: # and his as well, because it reads from disk
    mm = mmap.mmap(f.fileno(), 0)
    mm.seek(0)
    print mm.readline()
    mm.close() # question: if I do not include this, does this become a memory leak?
1

3 Answers 3

50

r.raw (HTTPResponse) is already a file-like object (just pass stream=True):

#!/usr/bin/env python
import sys
import requests # $ pip install requests
from PIL import Image # $ pip install pillow

url = sys.argv[1]
r = requests.get(url, stream=True)
r.raw.decode_content = True # Content-Encoding
im = Image.open(r.raw) #NOTE: it requires pillow 2.8+
print(im.format, im.mode, im.size)

In general if you have a bytestring; you could wrap it as f = io.BytesIO(r.content), to get a file-like object without touching the disk:

#!/usr/bin/env python
import io
import zipfile
from contextlib import closing
import requests # $ pip install requests

r = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip")
with closing(r), zipfile.ZipFile(io.BytesIO(r.content)) as archive:
    print({member.filename: archive.read(member) for member in archive.infolist()})

You can't pass r.raw to ZipFile() directly because the former is a non-seekable file.

I would like to see if I can circumvent having to code a file deletion line

tempfile can delete files automatically f = tempfile.SpooledTemporaryFile(); f.write(u.content). Until .fileno() method is called (if some api requires a real file) or maxsize is reached; the data is kept in memory. Even if the data is written on disk; the file is deleted as soon as it closed.

Sign up to request clarification or add additional context in comments.

Comments

13

This is what I ended up doing.

import zipfile 
import requests
import StringIO

u = requests.get("http://www.pythonchallenge.com/pc/def/channel.zip")
f = StringIO.StringIO() 
f.write(u.content)

def extract_zip(input_zip):
    input_zip = zipfile.ZipFile(input_zip)
    return {i: input_zip.read(i) for i in input_zip.namelist()}
extracted = extract_zip(f)

Comments

11

Your answer is u.content. The content is in the memory. Unless you write it to a file, it won’t be stored on disk.

1 Comment

How do you take u.content, and create a file like object (what mmap does), without writing it to disk?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.