0

I'm trying to decompress a gzip file in Python. The gzip file is downloaded from the internet and then saved locally, and then attempted to be decompressed. For some reason, the output file returns as 0bytes. When I manually extract the file through an application, the data is a .list file which works fine as a .txt file when it is renamed. Can someone let me know why there is no data in the output decompressed file? Still learning Python.

def downloadExtractMovies():
    moviePath = os.path.join(currentDir,moviesList)
    response_movies = open(moviePath, 'w')
    f = urlopen(reqMovies)
    local_file = open(moviesList, "w")
    local_file.write(f.read())
    response_movies.close()
    decompressedFile = gzip.GzipFile(fileobj=local_file, mode='rb')

    with open(outFilePath_movies, 'w') as outfile:
        outfile.write(decompressedFile.read())

    local_file.close()

Thanks

edit: I fixed the problem somewhat by wrapping the file object into a StringIO. However, when I extract a file that outputs a 160MB file for example, it runs perfectly. But when I run a larger file, like 220MB, it gives me a memoryerror.

Here is the code:

def downloadExtractMovies():
    moviePath = os.path.join(currentDir,moviesList)

    response_movies = open(moviePath, 'w')
    f = urlopen(reqMovies)
    url_f = StringIO.StringIO(f.read())

    with open(moviesList, 'wb') as local_file:
        local_file.write(f.read())

    response_movies.close()

    decompressedFile = gzip.GzipFile(fileobj=url_f, mode='rb')

    with open(outFilePath_movies, 'w') as outfile:
        outfile.write(decompressedFile.read())

Here's the traceback:

  File "D:\Portable Python 2.7.6.1\App\lib\gzip.py", line 254, in read
    self._read(readsize)
  File "D:\Portable Python 2.7.6.1\App\lib\gzip.py", line 313, in _read
    self._add_read_data( uncompress )
  File "D:\Portable Python 2.7.6.1\App\lib\gzip.py", line 331, in     _add_read_data
self.extrabuf = self.extrabuf[offset:] + data
MemoryError
3
  • Can you fix your indention first, Please Commented Dec 3, 2015 at 17:59
  • Put the local_file.close() line above the decompressedFile line. Commented Dec 3, 2015 at 18:00
  • Gives me a "ValueError: I/O operation on closed file" Commented Dec 4, 2015 at 16:55

1 Answer 1

2

The file is written in total with the close. So you have to close the file before reopening it again. Best use the with-statement, which closes files automatically:

with open(moviesList, "wb") as local_file:
    local_file.write(f.read())

instead of reading and writing yourself, use shutil.copyfileobj, this is more memory efficient. If you don't need the compressed data on disk, you can use the urllib-object directly:

def downloadExtractMovies(reqMovies, outFilePath_movies):
    decompressedFile = gzip.GzipFile(fileobj=urlopen(reqMovies), mode='rb')
    with open(outFilePath_movies, 'w') as outfile:
        shutil.copyfileobj(decompressedFile, outfile)
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, I tried your idea but I received an "AttributeError: addinfourl instance has no attribute 'tell'
The first solution you suggest, with the with statement, gives me a "ValueError: I/O operation on closed file''

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.