Python gzip: is there a way to decompress from a string?

Question

I've read this SO post around the problem to no avail.

I am trying to decompress a .gz file coming from an URL.

url_file_handle=StringIO( gz_data )
gzip_file_handle=gzip.open(url_file_handle,"r")
decompressed_data = gzip_file_handle.read()
gzip_file_handle.close()

... but I get TypeError: coercing to Unicode: need string or buffer, cStringIO.StringI found

What's going on?

Traceback (most recent call last):  
  File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2974, in _HandleRequest
    base_env_dict=env_dict)
  File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 411, in Dispatch
    base_env_dict=base_env_dict)
  File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2243, in Dispatch
    self._module_dict)
  File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2161, in ExecuteCGI
    reset_modules = exec_script(handler_path, cgi_path, hook)
  File "/opt/google/google_appengine-1.2.5/google/appengine/tools/dev_appserver.py", line 2057, in ExecuteOrImportScript
    exec module_code in script_module.__dict__
  File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 36, in <module>
    main()
  File "/home/jldupont/workspace/jldupont/trunk/site/app/server/tasks/debian/repo_fetcher.py", line 30, in main
    gziph=gzip.open(fh,'r')
  File "/usr/lib/python2.5/gzip.py", line 49, in open
    return GzipFile(filename, mode, compresslevel)
  File "/usr/lib/python2.5/gzip.py", line 95, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: coercing to Unicode: need string or buffer, cStringIO.StringI found

please post more from the traceback (which line is failing). — tonfa
– tonfa, Commented Oct 9, 2009 at 13:12
Would using stringIO instead of cStringIO solve the problem? — recursive
– recursive, Commented Oct 9, 2009 at 13:15
@recursive: I've already added the necessary checks for the availability of cStringIO and revert to StringIO if not. — jldupont
– jldupont, Commented Oct 9, 2009 at 13:18

Abyx · Accepted Answer · 2015-11-02 11:15:39Z

57

If your data is already in a string, try zlib, which claims to be fully gzip compatible:

import zlib
decompressed_data = zlib.decompress(gz_data, 16+zlib.MAX_WBITS)

Read more: http://docs.python.org/library/zlib.html‎

edited Nov 2, 2015 at 11:15

Abyx

13k5 gold badges49 silver badges83 bronze badges

answered Aug 19, 2013 at 17:21

Anon Mouse

6785 silver badges5 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

HongboZhu Over a year ago

This was also what came to me in the beginning. But it quickly turned out that it does not work:"zlib.error: Error -3 while decompressing data: incorrect header check"

RickardSjogren Over a year ago

Add argument so the call looks like decompressed_data=zlib.decompress(gz_data, 16+zlib.MAX_WBITS), works like a charm for me. Thanks to here stackoverflow.com/a/2695575/3635816

Nakilon · Accepted Answer · 2013-06-07 11:47:41Z

39

gzip.open is a shorthand for opening a file, what you want is gzip.GzipFile which you can pass a fileobj

open(filename, mode='rb', compresslevel=9)
    #Shorthand for GzipFile(filename, mode, compresslevel).

vs

class GzipFile
   __init__(self, filename=None, mode=None, compresslevel=9, fileobj=None)
   #    At least one of fileobj and filename must be given a non-trivial value.

so this should work for you

gzip_file_handle = gzip.GzipFile(fileobj=url_file_handle)

edited Jun 7, 2013 at 11:47

Nakilon

35.2k16 gold badges112 silver badges149 bronze badges

answered Oct 9, 2009 at 13:13

Jehiah

2,77922 silver badges18 bronze badges

Comments

Montoya · Accepted Answer · 2020-11-03 08:37:25Z

10

You can use gzip.decompress from the gzip builtin Python library(available for Python 3.2+).

Example on how to decompress bytes:

import gzip
gzip.decompress(gzip_data)

Documentation

https://docs.python.org/3.5/library/gzip.html#gzip.decompress

answered Nov 3, 2020 at 8:37

Montoya

3,0595 gold badges41 silver badges72 bronze badges

Comments

user1444978 · Accepted Answer · 2017-01-20 14:45:57Z

1

Consider using gzip.GzipFile if you don't like passing obscure arguments to zlib.decompress.

When you deal with urllib2.urlopen response that can be either gzip-compressed or uncompressed:

import gzip
from StringIO import StringIO

# response = urllib2.urlopen(...

content_raw = response.read()
if 'gzip' in response.info().getheader('Content-Encoding'):
    content = gzip.GzipFile(fileobj=StringIO(content_raw)).read()

When you deal with a file that can store either gzip-compressed or uncompressed data:

import gzip

# some_file = open(...

try:
    content = gzip.GzipFile(fileobj=some_file).read()
except IOError:
    some_file.seek(0)
    content = some_file.read()

The examples above are in Python 2.7

edited Jan 20, 2017 at 14:45

answered Jan 19, 2017 at 23:44

user1444978

1247 bronze badges

1 Comment

Eli Korvigo Over a year ago

using try/except with read on opened IO buffers is not the best way to infer format, because each attempt might consume some bytes.

Collectives™ on Stack Overflow

Python gzip: is there a way to decompress from a string?

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related