Using structs with attibute packed when parsing binary data

Question

Have seen various code around where one read data into a char or void and then cast it to a struct. Example is parsing of file formats where data has fixed offsets.

Example:

struct some_format {
    char magic[4];
    uint32_t len;
    uint16_t foo;
};

struct some_format *sf = (struct some_format*) buf;

To be sure this is always valid one need to align the struct by using __attribute__((packed)).

struct test {
    uint8_t a;
    uint8_t b;
    uint32_t c;
    uint8_t d[128];
} __attribute__((packed));

When reading big and complex file formats this surely makes things much simpler. Typically reading media format with structs having 30+ members etc.

It is also easy to read in a huge buffer and cast to proper type by for example:

struct mother {
    uint8_t a;
    uint8_t b;
    uint32_t offset_child;
};

struct child {
     ...
}

m = (struct mother*) buf;
c = (struct child*) ((uint8_t*)buf + mother->offset_child);

Or:

read_big_buf(buf, 4096);

a = (struct a*) buf;
b = (struct b*) (buf + sizeof(struct a));
c = (struct c*) (buf + SOME_DEF);
...

It would also be easy to quickly write such structures to file.

My question is how good or bad this way of coding is. I am looking at various data structures and would use the best way to handle this.

Is this how it is done? (As in: is this common practice.)
Is __attribute__((packed)) always safe?
~~Is it better to use sscanf.~~ ^{What was I thinking about?, Thanks @Amardeep}
Is it better to make functions where one initiates structure with casts and bit shifting.
etc.

As of now I use this mainly in data information tools. Like listing all structures of a certain type with their values in a file format like e.g. a media stream. Information dumping tools.

Beware of example code that reads anything into a void... Or even a char if it reads more than one byte. Maybe into a char * or a void *, though, as long as the buffer it points to is big enough... — twalberg
– twalberg, Commented May 6, 2013 at 19:25

Amardeep AC9MF · Accepted Answer · 2013-05-06 19:06:51Z

4

It is how it is sometimes done. Packed is safe as long as you use it correctly. Using sscanf() would imply you are reading text data, which is a different use case than a binary image from a structure.

If your code does not require portability across compilers and/or platforms (CPU architectures), and your compiler has support for packed structures, then this is a perfectly legitimate way of accessing serialized data.

However, problems may arise if you try to generate data on one platform and use it on another due to:

Host Byte Order (Little Endian/Big Endian)
Different sizes for language primitive types (long can be 32 or 64 bits for example)
Code changes on one side but not the other.

There are libraries that simplify serialization/deserialization and handle most of these issues. The overhead of such operations is easier justified on systems that must span processes and hosts. However, if your structures are very complex, using a ser/des library may be justified simply due to ease of maintenance.

answered May 6, 2013 at 19:06

Amardeep AC9MF

19.2k5 gold badges42 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Zimzalabim Over a year ago

Thanks. But, (I should have mentioned this as well,) I also use inttypes.h Thought that was pretty common in compilers. Byte order is a bigger issue though. I post process after cast if that is an issue – but find it quick and easy to get a starting point this way. Some of the specifications also have both LSB and MSB in multiple records - in same file/data - like Ecma 119. (And yes, do not know what I wsa thinking about mentioning sscanf :P)

supercat Over a year ago

I wish there were a way of specifying a cross between struct members and a bitfield, such that one could request that a struct should be stored as e.g. four unsigned short values, and that member fnorble should have bits 0-7 stored in bits 8-15 of the second value, and bits 8-15 stored in bits 16-23 of the third. On an x86, if ESI holds the address of the structure, a compiler could read fnorble via mov ax,[esi+3]; on a Cortex M0, if address is in r0, ldrb r1,[r0+3]/ldrb r2,[r0+4]/add r1,r1,r2 asl #8.

supercat Over a year ago

Such a construct would have meaning even on a machine where sizeof(short)==1; in that case, fnorble would be read as ((data[1] >> 8) | (data[2] << 8)) & 0xFFFF [if one used unsigned char as the underlying type, then on a machine with 16-bit char values, the upper 8 bits of each char value would be ignored, which would be desirable in some cases and undesirable in others].

score 1 · Accepted Answer · 2013-05-06 19:01:42Z

1

Is this how it is done?

~~I don't this question understand.~~ Edit: you'd like to know if this is a common idiom. In codebases where dependency on GNU extensions is acceptable, yes, this is used quite frequently, since it's convenient.

is __attribute__((packed)) always safe?

For this use case, pretty much yes, except when it's unavailable.

Is it better to use sscanf.

No. Don't use scanf().

Is it better to make functions where one initiates structure with casts and bit shifting.

It's more portable. __attribute__((packed)) is a GNU extension, and not all compilers support it (although I'm wondering who cares about compilers other than GCC and Clang, but theoretically, this still is an issue).

edited May 6, 2013 at 19:01

answered May 6, 2013 at 18:55

user529758

5 Comments

Zimzalabim Over a year ago

Thanks. By "Is this how it is done?" – I mean is this a common practice. Is it frowned upon for some reason etc. I have seen it in some code, but not that frequent.

user529758 Over a year ago

@Zimzalabim Oh, I see. So you're interested in if it's a common idiom. I'd say it's quite common in code that doesn't have to be very portable, because it's easy and convenient. The frowning-upon part is related to my note about (un)portability.

pg1989 Over a year ago

Some developers use ICC as their primary compiler; it's popular in academic PL circles.

user529758 Over a year ago

@pg1989 There are also embedded systems where TCC was used because of its tiny size, IIRC, not sure if it supports GNU extensions, though.

Zimzalabim Over a year ago

But if inttypes.h is supported, one can define macros for e.g. MSVC by #pragma pack(push,1) etc I guess. If int-types is not available one is out of luck I guess.

supercat · Accepted Answer · 2013-05-06 21:50:05Z

One of my gripes about C language standards to date is that they impose enough rules about how compilers have to lay out structures and bit fields to preclude what might otherwise be useful optimizations [e.g. on a system with power-of-two integer sizes, a compiler would be forbidden from packing eight three-bit fields into three bytes] but does not provide any means by which a programmer can specify an explicit struct layout. I used to frequently use byte pointers to read out data from structures, but I don't favor such techniques now so much as I used to. When speed isn't critical, I prefer nowadays to use a family functions which either write multi-byte types to multiple consecutive memory locations using whatever endianness is needed [e.g. void storeI32LE(uint8_t **dest, int32_t dat) or int32_t readI32LE(uint8_t const **src);]. Such code will not be as efficient as what a compiler might be able to write in cases where processors have the correct endianness and either the structure members are aligned or processors support unaligned accesses, but code using such methods may easily be ported to any processor regardless of its native alignment and endianness.

Collectives™ on Stack Overflow

Using structs with attibute packed when parsing binary data

3 Answers 3

3 Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related