1

I'm attempting to read and parse a binary file with Python.

The issue is that the data in the file can be in little-endian or big-endian format, as well as 32- or 64-bit values. In the file header there are a few bytes that specify the data format and size. Let's assume that I've read these in and I know the format and size, and I try to construct a format string as follows:

    if (bitOrder == 1):      # little-endian format
        strData = '<'
    elif (bitOrder == 2):    # bit-endian format
        strData = '>'

    if (dataSize == 1):      # 32-bit data
        strLen = 'L'
    elif (dataSize == 2):
        strLen = 'q'

    strFormat = strData + strLen
    struct.unpack(strFormat, buf)

When I do this I get the error: "struct.error: unpack requires a string argument of length 2", yet if I write struct.unpack('<L', buf) I get the expected result.

On an interactive shell, if I run type(strFormat) I get the result <type, 'str'> and when I run len(strFormat) I get a result of 2.

So, being relatively new to Python, I have the following questions:

  1. Is not str the same as a string? If not, how do I convert between the two?

  2. How would I correctly construct the format string for use in an unpack function?

------ edit ------ to address comments:

  1. at this time I'm using python-2.7 due to constraints of other projects.

  2. I'm trying to avoid posting my code (its several hundred lines long), however here is an interact python (run from inside emacs, if that matters) that shows the behaviour I'm experiencing:

    Python 2.7.5 (default, Jun 17 2014, 18:11:42) 
    [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> >>> >>> >>> 
    >>> import array
    >>> import struct
    >>> header = array.array('B',[0x7f, 0x45, 0x4c, 0x46, 0x02, 0x01, 0x01, 0x00, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0x00,0x3e, 0x00, 0x01, 0x00, 0x00, 0x00, 0x40, 0x04, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x70, 0x11, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x00,0x00, 0x00, 0x00, 0x40, 0x00, 0x38, 0x00, 0x09, 0x00, 0x40, 0x00, 0x1e, 0x00, 0x1b, 0x00])
    >>> entry = header[24:32]
    >>> phoff = header[32:40]
    >>> shoff = header[40:48]
    >>> strData = '<'
    >>> strLen = 'H'
    >>> strFormat = strData + strLen
    >>> print strFormat
    <H
    >>> type(strFormat)
    <type 'str'>
    >>> len(strFormat)
    2
    >>> temp = struct.unpack(strFormat, entry)
    Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
    struct.error: unpack requires a string argument of length 2
    >>> 
    
  3. fixed types in original code.

6
  • Could you post your actual code? What you have here obviously isn't it, as you're missing the colons in the second if block. (Also, is this Python 2 or Python 3?) Commented Jan 30, 2015 at 16:56
  • Also, a suggestion: Try debugging your code by printing bitOrder, dataSize, and strFormat just before the call to struct.unpack. It's possible that one of those is taking an unexpected value. Commented Jan 30, 2015 at 16:58
  • What's len(buf)? That's what unpack is complaining about -- and not with the code you're showing; L requires a length 4, q requires a length 8 -- neither requires a length 2. struct.calcsize(strFormat) will tell you how many bytes strFormat requires on your platform, and len(buf) must == exactly that number of bytes. Commented Jan 30, 2015 at 16:58
  • 2
    As for your Qs 1 and 2, an str is indeed a string, and concatenating parts is one valid way to build it. Your error(s) must be in other parts of the code you're not showing (besides the syntax error in lack of colons which @jwodder notices, and guarantees 100% that you're not showing us the actual code you're running -- thus making helping you very, very hard!-). Commented Jan 30, 2015 at 17:00
  • Why are you setting strLen = 'H' in the second example? H is for 2-byte integers, and is not one of the strLen values you use in your first code snippet. Commented Jan 30, 2015 at 18:01

1 Answer 1

1

Going by the interactive session, your problem would appear to be this:

temp = struct.unpack(strFormat, entry)

Earlier, you said:

entry = header[24:32]

entry is 8 bytes long, but strFormat says it should be 2 bytes long. That's what struct is complaining about.

It should also be a bytes object (str under 2.x), not an array.array.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.