Using GZIP Module with Python

Question

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:

import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    #print file
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        inF = gzip.open(file, 'rb')
        s = inF.read()
        inF.close()

the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?

Yes, the file is ok. I can uncompress the file using gunzip onthe UNIX command line. — user3111358
– user3111358, Commented Dec 17, 2013 at 13:41

loopbackbee · Accepted Answer · 2019-06-12 14:19:44Z

40

If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.

The precise definition of "decompressed" varies on context:

I do not want to read the files, only uncompress them

The gzip module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".

What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"

inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()

With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:

with open(out_filename, 'wb') as out_file:
    out_file.write(s)

If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.

edited Jun 12, 2019 at 14:19

answered Dec 17, 2013 at 13:31

loopbackbee

23.6k11 gold badges69 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

user3111358 Over a year ago

No error occurs when I run the Python script, but the gzip file isn't uncompressed. I only want to uncompress the file so it can be used by another tool, not re-written to a file, or used elsewhere in my script.

loopbackbee Over a year ago

@user3111358 What do mean, exactly by "the gzip file isn't uncompressed"? What makes you say so? Have you checked the contents of s in your code?

user3111358 Over a year ago

What I mean is the gzip file isn't uncompressed, which is what I am trying to do. I ONLY want to uncompress, nothing else.

loopbackbee Over a year ago

@user3111358 What I'm trying to say is that "uncompress" means different things in different contexts. My bet is that if you ask a few people who've read your code here on SO, they'll tell you the file is being uncompressed. Thus, I must ask: how do you know the file is not being "uncompressed"? Is it because there are no new files being put on the same directory as the compressed file when you run the code?

sage88 Over a year ago

This is the correct answer. When you decompress a file it is written to a new file and the previous compressed file is either deleted as another action, or the the compressed file is maintained. Either way, a new uncompressed file is written.

Jan Spurny · Accepted Answer · 2013-12-17 13:49:30Z

6

You're decompressing file into s variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.

Anyway, there's several thing wrong with your code:

you need is to STORE the unzipped data in s into some file.
there's no need to copy the actual *.gz files. Because in your code, you're unpacking the original gzip file and not the copy.
you're using file, which is a reserved word, as a variable. This is not an error, just a very bad practice.

This should probably do what you wanted:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(gzip_path) == False:
        inF = gzip.open(gzip_path, 'rb')
        # uncompress the gzip_path INTO THE 's' variable
        s = inF.read()
        inF.close()

        # get gzip filename (without directories)
        gzip_fname = os.path.basename(gzip_path)
        # get original filename (remove 3 characters from the end: ".gz")
        fname = gzip_fname[:-3]
        uncompressed_path = os.path.join(FILE_DIR, fname)

        # store uncompressed file data from 's' variable
        open(uncompressed_path, 'w').write(s)

answered Dec 17, 2013 at 13:49

Jan Spurny

5,5971 gold badge36 silver badges52 bronze badges

2 Comments

Ander Over a year ago

When you call open(uncompressed_path, 'w').write(s) without assigning the file handler to a variable there is no need to close the file handler?

Jan Spurny Over a year ago

@Ander - yes, because the (anonymous) file object will never be assigned to a variable and therefore it will be destroyed immediately after executing. I find it much cleaner for simple "write xy to file" or "read from file" - that is when there is exactly one read or write. But if you do more than one read/write, you should probably always use with open(...):

Martin Thoma · Accepted Answer · 2015-10-12 17:58:23Z

6

You should use with to open files and, of course, store the result of reading the compressed file. See gzip documentation:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
    if not os.path.isdir(gzip_path):
        with gzip.open(gzip_path, 'rb') as in_file:
            s = in_file.read()

        # Now store the uncompressed data
        path_to_store = gzip_fname[:-3]  # remove the '.gz' from the filename

        # store uncompressed file data from 's' variable
        with open(path_to_store, 'w') as f:
            f.write(s)

Depending on what exactly you want to do, you might want to have a look at tarfile and its 'r:gz' option for opening files.

answered Oct 12, 2015 at 17:58

Martin Thoma

139k174 gold badges687 silver badges1.1k bronze badges

2 Comments

gotson Over a year ago

It would be nicer to use os.path.splitext(gzip_fname)[0] to remove the .gz extension

hoaphumanoid Over a year ago

your example is wrong, gzip_fname doesn't exist, you have to change it to gzip_path. Furthermore, what you get into gzip_path is not a path, it's the gz file. Therefore you should change os.path.isdir to os.path.isfile I also think that to use @gotson solution is nicer :)

user3111358 · Accepted Answer · 2013-12-17 14:37:39Z

4

I was able to resolve this issue by using the subprocess module:

for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])

Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.

Thanks for everyone's help. It is much appreciated!

answered Dec 17, 2013 at 14:37

user3111358

3111 gold badge3 silver badges7 bronze badges

1 Comment

ChrisGuest Over a year ago

Yes, if you don't need to programmatically manipulate the contents of the code and don't mind if it isn't interoperable between OSes, then this is a much more intuitive way to approach things.

Dalupus · Accepted Answer · 2016-04-03 22:59:24Z

0

I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:

import glob
from setuptools import archive_util

for fn in glob.glob('*.gz'):
  archive_util.unpack_archive(fn, '.')

answered Apr 3, 2016 at 22:59

Dalupus

1,1209 silver badges19 bronze badges

1 Comment

punchcard Over a year ago

Archive_util.unpack_archive does not seem to support .gz. The error message is "setuptools.archive_util.UnrecognizedFormat: Not a recognized archive type: K:\z_temp\file.gz". Also shutil.upack_archive does not support .gz. To see the supported types of files for shutil_unpack_archive: import shutil; print(shutil.get_archive_formats())

Collectives™ on Stack Overflow

Using GZIP Module with Python

5 Answers 5

5 Comments

2 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

2 Comments

2 Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related