C Reading binary data into struct

Question

I have a struct

typedef struct {
    uint8_t type;  // 1B -> 1B
    uint16_t hash; // 2B -> 3B
    uint16_t id;   // 2B -> 5B
    uint32_t ip;   // 4B -> 9B
    uint16_t port; // 2B -> 11B
} Data;

and some binary data (which is a stored instance of Data on disk)

const unsigned char blob[11] = { 0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00 };

I want to "read" the blob into my struct, the first byte 0x00 corresponds to type, the second and third byte 0x00, 0x7b correspond to hash, etc.

I can't just do Data *data = (Data *)blob, since the actual size of Data will probably be bigger than 11 Bytes (Faster RAM access or something. Not relevant here.) The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

So how can I "import" my blob into a Data struct without having to use memcpy for every attribute? Aka what's nicest/simplest solution for this in C?

I'd use union for the overlapping fields (other than type). — Eugene Sh.
– Eugene Sh., Commented Jan 4, 2024 at 18:50
Another alternative is to read each field one by one, and just assign it to the structure member. Or maybe even directly into the individual structure members directly. — Some programmer dude
– Some programmer dude, Commented Jan 4, 2024 at 19:08
The portable way is to deserialize each member individually. — 500 - Internal Server Error
– 500 - Internal Server Error, Commented Jan 4, 2024 at 19:08
Oh and with all of this, no matter how you solve it, you need to be able to handle byte ordering for the multi-byte values. — Some programmer dude
– Some programmer dude, Commented Jan 4, 2024 at 19:09

John Bollinger · Accepted Answer · 2024-01-04 19:23:32Z

The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

Since you cannot rely on the data layout in the file to match that of the in-memory structure, standard C does not provide an alternative to working member by member.

But reading the data from disk is potentially a different question from reading it from an array. I suppose you imagine reading one or more whole raw records into memory and then copying from there in some way, but if you can rely on the sizes and endianness of the individual fields matching between structure and disk then you could consider this:

Data item;

if (fread(&item.type, sizeof(item.type), 1, input_file) == 0) handle_error();
if (fread(&item.hash, sizeof(item.hash), 1, input_file) == 0) handle_error();
if (fread(&item.ip,   sizeof(item.ip),   1, input_file) == 0) handle_error();
if (fread(&item.id,   sizeof(item.id),   1, input_file) == 0) handle_error();
if (fread(&item.port, sizeof(item.port), 1, input_file) == 0) handle_error();

That lets the stream handle the buffering (which it will, unless you disable that), relieves you of counting bytes, and is pretty clear. Five calls to fread() might be a bit more expensive than five to memcpy(), but you're unlikely to notice the difference next to the cost of opening the file and transfering data from it.

If you do need to populate the structure from an in-memory array containing raw bytes from the file, however, then per-member memcpy() is the most portable way. And quite possibly more efficient than you think.

usef · Accepted Answer · 2024-01-04 19:24:16Z

1

Assuming the right sided bytes in the blob array correspond to a higher weight than the left sided bytes, then, a simple solution would be using bitwise operators in the following way:

void importFromBlob(const unsigned char *blob, Data *data) {
    // Import type
    data->type = blob[0];

    // Import hash 
    data->hash = (blob[1] << 8) | blob[2];  //assembled as 0x007b in your exp

    // Import id 
    data->id = (blob[3] << 8) | blob[4];

    // Import ip 
    data->ip = (uint32_t)(blob[5]) | (uint32_t)(blob[6] << 8) | (uint32_t)(blob[7] << 16) | (uint32_t)(blob[8] << 24);

    // Import port
    data->port = (blob[9] << 8) | blob[10];
}

answered Jan 4, 2024 at 19:24

usef

455 bronze badges

2 Comments

John Bollinger Over a year ago

That might work, but it assumes that the data are stored in big-endian order in the source. It could be adjusted to assume little-endian order instead, but if one wants to assume the machine byte order, whatever that may be, then you need memcpy() or an equivalent.

usef Over a year ago

Got your point, your right, and thanks for the clarification @john. I will be sure to consider these cases moving forward.

chqrlie · Accepted Answer · 2024-01-05 21:37:39Z

0

The simplest way to avoid multiple reads or byte copies for this particular structure is to pad the structure explicitly with 3 initial bytes and 2 trailing bytes:

typedef struct {
    uint8_t pad0[3];  // 3 bytes at offset 0, unused
    uint8_t type;     // 1 byte  at offset 3
    uint16_t hash;    // 2 bytes at offset 4
    uint16_t id;      // 2 bytes at offset 6
    uint32_t ip;      // 4 bytes at offset 8
    uint16_t port;    // 2 bytes at offset 12
    uint8_t pad1[2];  // 2 bytes at offset 14, unused. total: 16 bytes
} Data;

You would read the data with a single fread:

    Data mydata;
    if (fread(&mydata.type, 1, 11, fp) == 11) {
        // mydata was read successfully
        // fields can be used directly assuming correct endianness.
    } else {
        // read error
    }

Copying from the memory blob is also a single call to memcpy:

    const unsigned char blob[11] = {
        0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00
    };
    memcpy(&mydata.type, bloc, 11);

Reading the binary data requires opening the file in binary mode "rb", "wb"...

Writing the data in a single fwrite is done with

    if (fwrite(&mydata.type, 1, 11, fp) == 11) ...

This trick works for the structure in the question, but might not work in many other cases:

if the endianness in the file differs from the CPUs,
if the the sequence of items larger than one byte is not favorable.

So in the general case, you may have to use memcpy to copy chunks of the compact byte oriented representation to the structure used ni memory, adjusting for potential endianness differences. memcpy with small fixed sizes is usually expanded inline efficiently generating very few instructions as can be verified on using the Godbolt Compiler Explorer.

edited Jan 5, 2024 at 21:37

answered Jan 4, 2024 at 22:04

chqrlie

152k12 gold badges145 silver badges231 bronze badges

1 Comment

John Bollinger Over a year ago

That's an insightful observation about the OP's particular struct, but it's not generalizable. And it's helpful to them only if they are in fact free to modify the struct definition, which might or might not be the case. But if they can modify the structure then it's nice.

Collectives™ on Stack Overflow

C Reading binary data into struct

3 Answers 3

Comments

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related