1

I have a struct

typedef struct {
    uint8_t type;  // 1B -> 1B
    uint16_t hash; // 2B -> 3B
    uint16_t id;   // 2B -> 5B
    uint32_t ip;   // 4B -> 9B
    uint16_t port; // 2B -> 11B
} Data;

and some binary data (which is a stored instance of Data on disk)

const unsigned char blob[11] = { 0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00 };

I want to "read" the blob into my struct, the first byte 0x00 corresponds to type, the second and third byte 0x00, 0x7b correspond to hash, etc.

I can't just do Data *data = (Data *)blob, since the actual size of Data will probably be bigger than 11 Bytes (Faster RAM access or something. Not relevant here.) The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

So how can I "import" my blob into a Data struct without having to use memcpy for every attribute? Aka what's nicest/simplest solution for this in C?

11
  • I'd use union for the overlapping fields (other than type). Commented Jan 4, 2024 at 18:50
  • 1
    #pragma pack? Commented Jan 4, 2024 at 18:56
  • 2
    Another alternative is to read each field one by one, and just assign it to the structure member. Or maybe even directly into the individual structure members directly. Commented Jan 4, 2024 at 19:08
  • 2
    The portable way is to deserialize each member individually. Commented Jan 4, 2024 at 19:08
  • 2
    Oh and with all of this, no matter how you solve it, you need to be able to handle byte ordering for the multi-byte values. Commented Jan 4, 2024 at 19:09

3 Answers 3

2

The point is sizeof(Data) == 16 and the representation in RAM may be different than the compact one on disk.

Since you cannot rely on the data layout in the file to match that of the in-memory structure, standard C does not provide an alternative to working member by member.

But reading the data from disk is potentially a different question from reading it from an array. I suppose you imagine reading one or more whole raw records into memory and then copying from there in some way, but if you can rely on the sizes and endianness of the individual fields matching between structure and disk then you could consider this:

Data item;

if (fread(&item.type, sizeof(item.type), 1, input_file) == 0) handle_error();
if (fread(&item.hash, sizeof(item.hash), 1, input_file) == 0) handle_error();
if (fread(&item.ip,   sizeof(item.ip),   1, input_file) == 0) handle_error();
if (fread(&item.id,   sizeof(item.id),   1, input_file) == 0) handle_error();
if (fread(&item.port, sizeof(item.port), 1, input_file) == 0) handle_error();

That lets the stream handle the buffering (which it will, unless you disable that), relieves you of counting bytes, and is pretty clear. Five calls to fread() might be a bit more expensive than five to memcpy(), but you're unlikely to notice the difference next to the cost of opening the file and transfering data from it.

If you do need to populate the structure from an in-memory array containing raw bytes from the file, however, then per-member memcpy() is the most portable way. And quite possibly more efficient than you think.

Sign up to request clarification or add additional context in comments.

Comments

1

Assuming the right sided bytes in the blob array correspond to a higher weight than the left sided bytes, then, a simple solution would be using bitwise operators in the following way:

void importFromBlob(const unsigned char *blob, Data *data) {
    // Import type
    data->type = blob[0];

    // Import hash 
    data->hash = (blob[1] << 8) | blob[2];  //assembled as 0x007b in your exp

    // Import id 
    data->id = (blob[3] << 8) | blob[4];

    // Import ip 
    data->ip = (uint32_t)(blob[5]) | (uint32_t)(blob[6] << 8) | (uint32_t)(blob[7] << 16) | (uint32_t)(blob[8] << 24);

    // Import port
    data->port = (blob[9] << 8) | blob[10];
}

2 Comments

That might work, but it assumes that the data are stored in big-endian order in the source. It could be adjusted to assume little-endian order instead, but if one wants to assume the machine byte order, whatever that may be, then you need memcpy() or an equivalent.
Got your point, your right, and thanks for the clarification @john. I will be sure to consider these cases moving forward.
0

The simplest way to avoid multiple reads or byte copies for this particular structure is to pad the structure explicitly with 3 initial bytes and 2 trailing bytes:

typedef struct {
    uint8_t pad0[3];  // 3 bytes at offset 0, unused
    uint8_t type;     // 1 byte  at offset 3
    uint16_t hash;    // 2 bytes at offset 4
    uint16_t id;      // 2 bytes at offset 6
    uint32_t ip;      // 4 bytes at offset 8
    uint16_t port;    // 2 bytes at offset 12
    uint8_t pad1[2];  // 2 bytes at offset 14, unused. total: 16 bytes
} Data;

You would read the data with a single fread:

    Data mydata;
    if (fread(&mydata.type, 1, 11, fp) == 11) {
        // mydata was read successfully
        // fields can be used directly assuming correct endianness.
    } else {
        // read error
    }

Copying from the memory blob is also a single call to memcpy:

    const unsigned char blob[11] = {
        0x00, 0x00, 0x7b, 0x00, 0xea, 0x00, 0x00, 0x00, 0x59, 0x01, 0x00
    };
    memcpy(&mydata.type, bloc, 11);

Reading the binary data requires opening the file in binary mode "rb", "wb"...

Writing the data in a single fwrite is done with

    if (fwrite(&mydata.type, 1, 11, fp) == 11) ...

This trick works for the structure in the question, but might not work in many other cases:

  • if the endianness in the file differs from the CPUs,
  • if the the sequence of items larger than one byte is not favorable.

So in the general case, you may have to use memcpy to copy chunks of the compact byte oriented representation to the structure used ni memory, adjusting for potential endianness differences. memcpy with small fixed sizes is usually expanded inline efficiently generating very few instructions as can be verified on using the Godbolt Compiler Explorer.

1 Comment

That's an insightful observation about the OP's particular struct, but it's not generalizable. And it's helpful to them only if they are in fact free to modify the struct definition, which might or might not be the case. But if they can modify the structure then it's nice.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.