3

Have seen various code around where one read data into a char or void and then cast it to a struct. Example is parsing of file formats where data has fixed offsets.

Example:

struct some_format {
    char magic[4];
    uint32_t len;
    uint16_t foo;
};

struct some_format *sf = (struct some_format*) buf;

To be sure this is always valid one need to align the struct by using __attribute__((packed)).

struct test {
    uint8_t a;
    uint8_t b;
    uint32_t c;
    uint8_t d[128];
} __attribute__((packed));

When reading big and complex file formats this surely makes things much simpler. Typically reading media format with structs having 30+ members etc.

It is also easy to read in a huge buffer and cast to proper type by for example:

struct mother {
    uint8_t a;
    uint8_t b;
    uint32_t offset_child;
};

struct child {
     ...
}

m = (struct mother*) buf;
c = (struct child*) ((uint8_t*)buf + mother->offset_child);

Or:

read_big_buf(buf, 4096);

a = (struct a*) buf;
b = (struct b*) (buf + sizeof(struct a));
c = (struct c*) (buf + SOME_DEF);
...

It would also be easy to quickly write such structures to file.


My question is how good or bad this way of coding is. I am looking at various data structures and would use the best way to handle this.

  • Is this how it is done? (As in: is this common practice.)
  • Is __attribute__((packed)) always safe?
  • Is it better to use sscanf. What was I thinking about?, Thanks @Amardeep
  • Is it better to make functions where one initiates structure with casts and bit shifting.
  • etc.

As of now I use this mainly in data information tools. Like listing all structures of a certain type with their values in a file format like e.g. a media stream. Information dumping tools.

1
  • 1
    Beware of example code that reads anything into a void... Or even a char if it reads more than one byte. Maybe into a char * or a void *, though, as long as the buffer it points to is big enough... Commented May 6, 2013 at 19:25

3 Answers 3

4

It is how it is sometimes done. Packed is safe as long as you use it correctly. Using sscanf() would imply you are reading text data, which is a different use case than a binary image from a structure.

If your code does not require portability across compilers and/or platforms (CPU architectures), and your compiler has support for packed structures, then this is a perfectly legitimate way of accessing serialized data.

However, problems may arise if you try to generate data on one platform and use it on another due to:

  1. Host Byte Order (Little Endian/Big Endian)
  2. Different sizes for language primitive types (long can be 32 or 64 bits for example)
  3. Code changes on one side but not the other.

There are libraries that simplify serialization/deserialization and handle most of these issues. The overhead of such operations is easier justified on systems that must span processes and hosts. However, if your structures are very complex, using a ser/des library may be justified simply due to ease of maintenance.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. But, (I should have mentioned this as well,) I also use inttypes.h Thought that was pretty common in compilers. Byte order is a bigger issue though. I post process after cast if that is an issue – but find it quick and easy to get a starting point this way. Some of the specifications also have both LSB and MSB in multiple records - in same file/data - like Ecma 119. (And yes, do not know what I wsa thinking about mentioning sscanf :P)
I wish there were a way of specifying a cross between struct members and a bitfield, such that one could request that a struct should be stored as e.g. four unsigned short values, and that member fnorble should have bits 0-7 stored in bits 8-15 of the second value, and bits 8-15 stored in bits 16-23 of the third. On an x86, if ESI holds the address of the structure, a compiler could read fnorble via mov ax,[esi+3]; on a Cortex M0, if address is in r0, ldrb r1,[r0+3]/ldrb r2,[r0+4]/add r1,r1,r2 asl #8.
Such a construct would have meaning even on a machine where sizeof(short)==1; in that case, fnorble would be read as ((data[1] >> 8) | (data[2] << 8)) & 0xFFFF [if one used unsigned char as the underlying type, then on a machine with 16-bit char values, the upper 8 bits of each char value would be ignored, which would be desirable in some cases and undesirable in others].
1

Is this how it is done?

I don't this question understand. Edit: you'd like to know if this is a common idiom. In codebases where dependency on GNU extensions is acceptable, yes, this is used quite frequently, since it's convenient.

is __attribute__((packed)) always safe?

For this use case, pretty much yes, except when it's unavailable.

Is it better to use sscanf.

No. Don't use scanf().

Is it better to make functions where one initiates structure with casts and bit shifting.

It's more portable. __attribute__((packed)) is a GNU extension, and not all compilers support it (although I'm wondering who cares about compilers other than GCC and Clang, but theoretically, this still is an issue).

5 Comments

Thanks. By "Is this how it is done?" – I mean is this a common practice. Is it frowned upon for some reason etc. I have seen it in some code, but not that frequent.
@Zimzalabim Oh, I see. So you're interested in if it's a common idiom. I'd say it's quite common in code that doesn't have to be very portable, because it's easy and convenient. The frowning-upon part is related to my note about (un)portability.
Some developers use ICC as their primary compiler; it's popular in academic PL circles.
@pg1989 There are also embedded systems where TCC was used because of its tiny size, IIRC, not sure if it supports GNU extensions, though.
But if inttypes.h is supported, one can define macros for e.g. MSVC by #pragma pack(push,1) etc I guess. If int-types is not available one is out of luck I guess.
1

One of my gripes about C language standards to date is that they impose enough rules about how compilers have to lay out structures and bit fields to preclude what might otherwise be useful optimizations [e.g. on a system with power-of-two integer sizes, a compiler would be forbidden from packing eight three-bit fields into three bytes] but does not provide any means by which a programmer can specify an explicit struct layout. I used to frequently use byte pointers to read out data from structures, but I don't favor such techniques now so much as I used to. When speed isn't critical, I prefer nowadays to use a family functions which either write multi-byte types to multiple consecutive memory locations using whatever endianness is needed [e.g. void storeI32LE(uint8_t **dest, int32_t dat) or int32_t readI32LE(uint8_t const **src);]. Such code will not be as efficient as what a compiler might be able to write in cases where processors have the correct endianness and either the structure members are aligned or processors support unaligned accesses, but code using such methods may easily be ported to any processor regardless of its native alignment and endianness.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.