Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

Question

I am downloading a compressed file from the internet:

with lzma.open(urllib.request.urlopen(url)) as file:
    for line in file:
        ...

After having downloaded and processed a a large part of the file, I eventually get the error:

File "/usr/lib/python3.4/lzma.py", line 225, in _fill_buffer raise EOFError("Compressed file ended before the " EOFError: Compressed file ended before the end-of-stream marker was reached

I am thinking that it might be caused by an internet connection that drops or the server not responding for some time. If that is the case, is there anyway to make it keep trying, until connection is reestablished, instead of throwing an exception. I don't think it is a problem with the file, as I have manually downloaded many files like it from the same website manually and decompressed it. I have also been able to download and decompress some smaller files with Python. The file I am trying to download has a compressed size of about 20 GB.

How long does it take to download before you get the error? Some firewalls/proxies seem to terminate connections after a fixed timeout (e.g. 10 minutes). If it always fails after the same time interval, that may be a clue... — DNA
– DNA, Commented Apr 1, 2015 at 8:48
Possible duplicate of Python LZMA : Compressed data ended before the end-of-stream marker was reached — kenorb
– kenorb, Commented May 23, 2016 at 22:51
I'm having the same problem while trying to work with a very large file online using urllib.request.urlopen() and gzip. About 12 hours in I get a similar traceback. — bmende
– bmende, Commented Jun 29, 2016 at 20:21
Can't parse file if don't read headers(packet). Need check packet index and size(so urllib not resolved your problem). EOF and answer end header how to separate each one ? My opinion : urllib detect file EOFas answer END. — dsgdfg
– dsgdfg, Commented Jul 1, 2016 at 9:05

Pynchia · Accepted Answer · 2015-04-01 10:38:41Z

3

from the urllib.urlopen docs:

One caveat: the read() method, if the size argument is omitted or negative, may not read until the end of the data stream; there is no good way to determine that the entire stream from a socket has been read in the general case.

Maybe the lzma.open trips on huge size/connection errors/timeout because of the above.

answered Apr 1, 2015 at 10:38

Pynchia

11.7k5 gold badges38 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kenorb · Accepted Answer · 2015-09-08 21:40:35Z

2

It's probably liblzma bug. As a workaround try adding:

lzma._BUFFER_SIZE = 1023

before calling lzma.open().

answered Sep 8, 2015 at 21:40

kenorb

169k95 gold badges712 silver badges796 bronze badges

Comments

Charles · Accepted Answer · 2016-07-06 16:26:49Z

2

Have you tried using the requests library? I believe it provides an abstraction over urllib.

The following solution should work for you, but it uses the requests library instead of urllib (but requests > urllib anyway!). Let me know if you prefer to continue using urllib.

import os
import requests
def download(url, chunk_s=1024, fname=None):
    if not fname:
        fname = url.split('/')[-1]
    req = requests.get(url, stream=True)
    with open(fname, 'wb') as fh:
        for chunk in req.iter_content(chunk_size=chunk_s):
            if chunk:
                fh.write(chunk)
    return os.path.join(os.getcwd(), fname)

answered Jul 6, 2016 at 16:26

Charles

4,3821 gold badge18 silver badges15 bronze badges

Comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

0

Assuming you need to download a big file, it is better to use the "write and binary" mode when writing content to a file in python.

You may also try to use the python requests module more than the urllib module:

Please see below a working code:

import requests
url="http://www.google.com"
with open("myoutputfile.ext","wb") as f:
    f.write( requests.get(url).content )

Could you test that piece of code and answer back if it doesn't solve your issue.

Best regards

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 6, 2016 at 11:14

A. STEFANI

6,7711 gold badge25 silver badges49 bronze badges

Collectives™ on Stack Overflow

Downloading large file in python error: Compressed file ended before the end-of-stream marker was reached

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related