What are methods for aligning mixed string and integer data within a single byte array?

Question

Let's say I'm building a byte array to send data over TCP/IP. This byte array contains a string (null terminated char array) along with an integer appended to the end.

So let's do this.

char buffer[24]; // buffer that will be sent over TCP/IP

char hello[7] = "hello"
int x = 12; // int is 4 bytes

So now let's say I perform a memcpy.

memcpy(buffer, hello, 7); // 7 force null character to be copied
memcpy(buffer+7, &x, 4);

By doing this, I believe I'm writing an integer to a non-word-aligned address. I assume this would be a performance hit when packaging the data?

Now let's imagine I send this data out and then receive it on another computer. When I go ahead and unpackage the data, I will need to perform proper casting. However, I'm still attempting to read an integer that isn't word aligned. This will again be a performance hit. I can imagine if I had an array of integers that were all misaligned, this performance hit would add up.

So my question: Is it common practice to word align all integers/floats when sending data over TCP/IP to avoid performance hits? In the case I illustrated above, would it be best to extend the length of the string to size 8 such that the next byte available is word aligned? Does memcpy offer any further methods for automatically compensating for word alignment?

Casting a pointer that doesn't match alignment requirements is not just a performance hit - it has undefined behaviour - even on platforms where you'd think the processor supports unaligned data access. memcpy is the proper thing to do; the compilers optimize memcpy away when it is safe to do. — Antti Haapala
– Antti Haapala, Commented Feb 23, 2017 at 23:02
Your question is not clear. you have to comply to the protocol used. But your approach is implementation specific and a bad approach. Properly serialise the integer. Never use casts if you don't really understand what they imply. — too honest for this site
– too honest for this site, Commented Feb 23, 2017 at 23:03
@Olaf I'm somewhat confused. Both the sender and receiver would know the proper offsets and data types. The receiver would receive a byte array that would then have to be decoded per the agreed upon message format. — Izzo
– Izzo, Commented Feb 23, 2017 at 23:09
@AnttiHaapala So let's say I'm generating a byte array that represents a null terminated string and 4 byte integer. The string can vary in length and will always be null terminated, but the length of the word in bytes isn't always divisible by 4. When I go and package the string and integer into a byte array, how can I ensure the integer is word aligned even though the string length varies? — Izzo
– Izzo, Commented Feb 23, 2017 at 23:18
It is common practice to not send binary data across a socket that depends on processor implementation details. Alignment is just one, you haven't yet thought about endiness. The very nice thing about sockets is that sending the data takes much, much longer than generating the data. So you have lots and lots of options to make that data not depend on processor details. Basic reason why you are looking at this comment from non-binary data, it was HTML. — Hans Passant
– Hans Passant, Commented Feb 23, 2017 at 23:21

Malcolm McLean · Accepted Answer · 2017-02-23 23:44:41Z

2

No, you're not really going to get a performance improvement, because comms programs are used to receiving arbitrary binary streams where the endiannness of integers is reversed or they don't have alignment.

Just say what the bits are and what they mean.

answered Feb 23, 2017 at 23:44

Malcolm McLean

6,4201 gold badge19 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Brendan · Accepted Answer · 2017-02-24 03:49:16Z

I assume this would be a performance hit when packaging the data?

That depends on which CPU and other factors (like whether or not the 4 bytes cross a cache line boundary); and also depends on how memcpy() was implemented.

However, I'm still attempting to read an integer that isn't word aligned.

No. Semantically, memcpy() copies bytes and you're copying four bytes (where any byte can't be misaligned).

In practice memcpy() might be optimised to work more efficiently (and might begin with a big slow mess that decides if it can/can't work more efficiently that ends up making it significantly slower than just doing the "naive" thing for small memory copies); but not being able to control lower level details like this is the price you pay for the convenience of not having to deal with lower level details.

Is it common practice to word align all integers/floats when sending data over TCP/IP to avoid performance hits?

It would be "more common" practice to put the integer at the start of the packet so that it's always at the same place regardless of the string's length (and ends up always aligned too).

Also note that this doesn't solve "endian" (byte order) issues. To solve endian issues you need to specify "big endian" or "little endian" in the specification that defines the networking protocol; and if that is "big-endian" then you need to use something like hton() (which will cause a minor performance hit on almost every computer that matters), and if it's "little-endian" then you're going to have to write your own conversion that will hopefully be free (optimised to nothing) when the host CPU is little-endian anyway. One approach to endian issues is to break it into bytes (like buffer[7] = x; buffer[8] = x >> 8; buffer[9] = x >> 16; buffer[10] = x >> 24;) which solves the alignment problem but only works for unsigned integers ("right shift of signed integer" is undefined behaviour).

Collectives™ on Stack Overflow

What are methods for aligning mixed string and integer data within a single byte array?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related