I have an xml file, which contains a set of textual element tags (each contains the decimal offset value and data length of the corresponding binary element) and the whole binary data of all the elements at the end. An example is as follows.
<?xml version="1.0" encoding="UTF-8"?>
<Package>
<element>
<offset>0</offset>
<length>2961181</length>
<checksum>4238515972</checksum>
<format>gzip</format>
</element>
<element>
<offset>2961181</offset>
<length>5442</length>
<checksum>4238515972</checksum>
<format>bin</format>
</element>
</Package>
BINARY_DATA
please note, the offset is decimal and counts from the first byte after the headers. How can I parse this file in python, grab the corresponding element based on the offset, uncompressed it (if its format is gzip) and store it as a file?
well, based on the replies from OmnipotentEntity and Jakob_B, I made the following short script, just to see if it works for the 1st element:
import zlib
f = open("file.xml", "r")
text = f.read()
position = text.find("</Package>\n")
headerSize=position+ len("</Package>\n") + 1
offset=0
f.seek(headerSize + offset)
length = 2961181
bin_data = f.read(length)
zipped=1
if (zipped):
ungziped_str = zlib.decompressobj().decompress('x\x9c' + bin_data)
print(ungziped_str)
f.close()
however, I got the following error:
Traceback (most recent call last): File "file_parse.py", line 11, in ? ungziped_str = zlib.decompressobj().decompress('x\x9c' + bin_data) zlib.error: Error -3 while decompressing: invalid block type
what is the problem? the input file is incorrect, or the code is incorrect?