24

I'm trying to use the Python GZIP module to simply uncompress several .gz files in a directory. Note that I do not want to read the files, only uncompress them. After searching this site for a while, I have this code segment, but it does not work:

import gzip
import glob
import os
for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    #print file
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        inF = gzip.open(file, 'rb')
        s = inF.read()
        inF.close()

the .gz files are in the correct location, and I can print the full path + filename with the print command, but the GZIP module isn't getting executed properly. what am I missing?

2
  • Is the file ok? You don't show what is/isn't happening. Commented Dec 17, 2013 at 13:32
  • Yes, the file is ok. I can uncompress the file using gunzip onthe UNIX command line. Commented Dec 17, 2013 at 13:41

5 Answers 5

40

If you get no error, the gzip module probably is being executed properly, and the file is already getting decompressed.

The precise definition of "decompressed" varies on context:

I do not want to read the files, only uncompress them

The gzip module doesn't work as a desktop archiving program like 7-zip - you can't "uncompress" a file without "reading" it. Note that "reading" (in programming) usually just means "storing (temporarily) in the computer RAM", not "opening the file in the GUI".

What you probably mean by "uncompress" (as in a desktop archiving program) is more precisely described (in programming) as "read a in-memory stream/buffer from a compressed file, and write it to a new file (and possibly delete the compressed file afterwards)"

inF = gzip.open(file, 'rb')
s = inF.read()
inF.close()

With these lines, you're just reading the stream. If you expect a new "uncompressed" file to be created, you just need to write the buffer to a new file:

with open(out_filename, 'wb') as out_file:
    out_file.write(s)

If you're dealing with very large files (larger than the amount of your RAM), you'll need to adopt a different approach. But that is the topic for another question.

Sign up to request clarification or add additional context in comments.

5 Comments

No error occurs when I run the Python script, but the gzip file isn't uncompressed. I only want to uncompress the file so it can be used by another tool, not re-written to a file, or used elsewhere in my script.
@user3111358 What do mean, exactly by "the gzip file isn't uncompressed"? What makes you say so? Have you checked the contents of s in your code?
What I mean is the gzip file isn't uncompressed, which is what I am trying to do. I ONLY want to uncompress, nothing else.
@user3111358 What I'm trying to say is that "uncompress" means different things in different contexts. My bet is that if you ask a few people who've read your code here on SO, they'll tell you the file is being uncompressed. Thus, I must ask: how do you know the file is not being "uncompressed"? Is it because there are no new files being put on the same directory as the compressed file when you run the code?
This is the correct answer. When you decompress a file it is written to a new file and the previous compressed file is either deleted as another action, or the the compressed file is maintained. Either way, a new uncompressed file is written.
6

You're decompressing file into s variable, and do nothing with it. You should stop searching stackoverflow and read at least python tutorial. Seriously.

Anyway, there's several thing wrong with your code:

  1. you need is to STORE the unzipped data in s into some file.

  2. there's no need to copy the actual *.gz files. Because in your code, you're unpacking the original gzip file and not the copy.

  3. you're using file, which is a reserved word, as a variable. This is not an error, just a very bad practice.

This should probably do what you wanted:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(gzip_path) == False:
        inF = gzip.open(gzip_path, 'rb')
        # uncompress the gzip_path INTO THE 's' variable
        s = inF.read()
        inF.close()

        # get gzip filename (without directories)
        gzip_fname = os.path.basename(gzip_path)
        # get original filename (remove 3 characters from the end: ".gz")
        fname = gzip_fname[:-3]
        uncompressed_path = os.path.join(FILE_DIR, fname)

        # store uncompressed file data from 's' variable
        open(uncompressed_path, 'w').write(s)

2 Comments

When you call open(uncompressed_path, 'w').write(s) without assigning the file handler to a variable there is no need to close the file handler?
@Ander - yes, because the (anonymous) file object will never be assigned to a variable and therefore it will be destroyed immediately after executing. I find it much cleaner for simple "write xy to file" or "read from file" - that is when there is exactly one read or write. But if you do more than one read/write, you should probably always use with open(...):
6

You should use with to open files and, of course, store the result of reading the compressed file. See gzip documentation:

import gzip
import glob
import os
import os.path

for gzip_path in glob.glob("%s/*.gz" % PATH_TO_FILE):
    if not os.path.isdir(gzip_path):
        with gzip.open(gzip_path, 'rb') as in_file:
            s = in_file.read()

        # Now store the uncompressed data
        path_to_store = gzip_fname[:-3]  # remove the '.gz' from the filename

        # store uncompressed file data from 's' variable
        with open(path_to_store, 'w') as f:
            f.write(s)

Depending on what exactly you want to do, you might want to have a look at tarfile and its 'r:gz' option for opening files.

2 Comments

It would be nicer to use os.path.splitext(gzip_fname)[0] to remove the .gz extension
your example is wrong, gzip_fname doesn't exist, you have to change it to gzip_path. Furthermore, what you get into gzip_path is not a path, it's the gz file. Therefore you should change os.path.isdir to os.path.isfile I also think that to use @gotson solution is nicer :)
4

I was able to resolve this issue by using the subprocess module:

for file in glob.glob(PATH_TO_FILE + "/*.gz"):
    if os.path.isdir(file) == False:
        shutil.copy(file, FILE_DIR)
        # uncompress the file
        subprocess.call(["gunzip", FILE_DIR + "/" + os.path.basename(file)])

Since my goal was to simply uncompress the archive, the above code accomplishes this. The archived files are located in a central location, and are copied to a working area, uncompressed, and used in a test case. the GZIP module was too complicated for what I was trying to accomplish.

Thanks for everyone's help. It is much appreciated!

1 Comment

Yes, if you don't need to programmatically manipulate the contents of the code and don't mind if it isn't interoperable between OSes, then this is a much more intuitive way to approach things.
0

I think there is a much simpler solution than the others presented given the op only wanted to extract all the files in a directory:

import glob
from setuptools import archive_util

for fn in glob.glob('*.gz'):
  archive_util.unpack_archive(fn, '.')

1 Comment

Archive_util.unpack_archive does not seem to support .gz. The error message is "setuptools.archive_util.UnrecognizedFormat: Not a recognized archive type: K:\z_temp\file.gz". Also shutil.upack_archive does not support .gz. To see the supported types of files for shutil_unpack_archive: import shutil; print(shutil.get_archive_formats())

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.