17

I have multiple zip files that have the same structure -- they contain XML files at the root level. All files in each zip file are unique (no duplicates across the zip files). I need to combine all of the XML files from all of the zip files into a single zip file (with the same structure as the original zip files). Suggestions for how to best go about doing this? Thanks.

4
  • 3
    Unpack them all and make a new one? Commented May 13, 2012 at 0:29
  • 3
    That would be the most obvious approach. You could also pick one as the final zipfile, extract files from the others and add them into the final one, but not sure it'll be any faster. Commented May 13, 2012 at 0:31
  • Thank you @sarnold. I too was thinking of this approach, but wasn't sure if there was a more elegant way to do it. Commented May 13, 2012 at 1:14
  • @jgritty, your idea is an interesting one. I think I may do a test to see if there is any performance improvement. Commented May 13, 2012 at 1:15

2 Answers 2

14

This is the shortest version I could come up with:

>>> import zipfile as z
>>> z1 = z.ZipFile('z1.zip', 'a')
>>> z2 = z.ZipFile('z2.zip', 'r')
>>> z1.namelist()
['a.xml', 'b.xml']
>>> z2.namelist()
['c.xml', 'd.xml']
>>> [z1.writestr(t[0], t[1].read()) for t in ((n, z2.open(n)) for n in z2.namelist())]
[None, None]
>>> z1.namelist()
['a.xml', 'b.xml', 'c.xml', 'd.xml']
>>> z1.close()

Without testing the alternative, to me this is the best (and probably most obvious too!) solution because - assuming both zip files contains the same amount of data, this method requires the decompression and re-compression of only half of it (1 file).

PS: List comprehension is there just to keep instructions on one line in the console (which speeds debugging up). Good pythonic code would require a proper for loop, given that the resulting list serves no purpose...

HTH!

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, although I will have a varying number of zip files, so I need a more generic approach.
@DaveCrumbacher: unless I misunderstood you, all you have to do to use this approach for merging more than one file, is to add a loop: for zfile in (z2, z3, z4, ...).... or am I missing something?
12

Here's what I came up with, thanks to @mac. Note that the way this is currently implemented the first zip file is modified to contain all the files from the other zip files.

import zipfile as z

zips = ['z1.zip', 'z2.zip', 'z3.zip']

"""
Open the first zip file as append and then read all
subsequent zip files and append to the first one
"""
with z.ZipFile(zips[0], 'a') as z1:
    for fname in zips[1:]:
        zf = z.ZipFile(fname, 'r')
        for n in zf.namelist():
            z1.writestr(n, zf.open(n).read())

2 Comments

zipfile.ZipFile() is a context manager as well, so you could replace your z1.close() with a with z.ZipFile(zips[0], 'a') as z1: and indent the subsequent code. The same with the reading objects.
Thanks @glglgl. I have updated my answer to reflect this approach.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.