I have multiple zip files that have the same structure -- they contain XML files at the root level. All files in each zip file are unique (no duplicates across the zip files). I need to combine all of the XML files from all of the zip files into a single zip file (with the same structure as the original zip files). Suggestions for how to best go about doing this? Thanks.
-
3Unpack them all and make a new one?sarnold– sarnold2012-05-13 00:29:29 +00:00Commented May 13, 2012 at 0:29
-
3That would be the most obvious approach. You could also pick one as the final zipfile, extract files from the others and add them into the final one, but not sure it'll be any faster.jgritty– jgritty2012-05-13 00:31:20 +00:00Commented May 13, 2012 at 0:31
-
Thank you @sarnold. I too was thinking of this approach, but wasn't sure if there was a more elegant way to do it.Dave Crumbacher– Dave Crumbacher2012-05-13 01:14:47 +00:00Commented May 13, 2012 at 1:14
-
@jgritty, your idea is an interesting one. I think I may do a test to see if there is any performance improvement.Dave Crumbacher– Dave Crumbacher2012-05-13 01:15:10 +00:00Commented May 13, 2012 at 1:15
2 Answers
This is the shortest version I could come up with:
>>> import zipfile as z
>>> z1 = z.ZipFile('z1.zip', 'a')
>>> z2 = z.ZipFile('z2.zip', 'r')
>>> z1.namelist()
['a.xml', 'b.xml']
>>> z2.namelist()
['c.xml', 'd.xml']
>>> [z1.writestr(t[0], t[1].read()) for t in ((n, z2.open(n)) for n in z2.namelist())]
[None, None]
>>> z1.namelist()
['a.xml', 'b.xml', 'c.xml', 'd.xml']
>>> z1.close()
Without testing the alternative, to me this is the best (and probably most obvious too!) solution because - assuming both zip files contains the same amount of data, this method requires the decompression and re-compression of only half of it (1 file).
PS: List comprehension is there just to keep instructions on one line in the console (which speeds debugging up). Good pythonic code would require a proper for loop, given that the resulting list serves no purpose...
HTH!
2 Comments
for zfile in (z2, z3, z4, ...).... or am I missing something?Here's what I came up with, thanks to @mac. Note that the way this is currently implemented the first zip file is modified to contain all the files from the other zip files.
import zipfile as z
zips = ['z1.zip', 'z2.zip', 'z3.zip']
"""
Open the first zip file as append and then read all
subsequent zip files and append to the first one
"""
with z.ZipFile(zips[0], 'a') as z1:
for fname in zips[1:]:
zf = z.ZipFile(fname, 'r')
for n in zf.namelist():
z1.writestr(n, zf.open(n).read())
2 Comments
zipfile.ZipFile() is a context manager as well, so you could replace your z1.close() with a with z.ZipFile(zips[0], 'a') as z1: and indent the subsequent code. The same with the reading objects.