Why Doesn't A Character Array Give an Unsigned Result

Question

In this project I am supposed to receive a packet, and cast a part of it to an unsigned integer and get both Big-Endian and Little-Endian results. Originally, I wanted to just cast a pointer inside the byte array (packet) to an unsigned integer type that would automatically put the value received in Big-Endian form, like (uint32_be_t*)packet; similar to the way that it's automatically put into Little-Endian form when doing (uint32_t*)packet.

Since I couldn't find a type that automatically did this, I decided to create my own structure called "u32" which has the methods "get," which gets the value in Big-Endian form, and "get_le," which gets the value in Little-Endian form. However, I noticed that when I do this I get a negative result from the Little-Endian result.

struct u32 {
    u8 data[4] = {};

    uint32_t get() {
        return ((uint32_t)data[3] << 0)
            | ((uint32_t)data[2] << 8)
            | ((uint32_t)data[1] << 16)
            | ((uint32_t)data[0] << 24);
    }
    
    uint32_t get_le() {
        return ((uint32_t)data[3] << 24)
            | ((uint32_t)data[2] << 16)
            | ((uint32_t)data[1] << 8)
            | ((uint32_t)data[0] << 0);
    }
};

In order to simulate a packet, I just created a character array and then cast a u32* to it like so:

int main() {
    char ary[] = { 0x00, 0x00, 0x00, (char)0xF4 };
    u32* v = (u32*)ary;    
    printf("%d %d\n", v->get(), v->get_le());
    return 0;
}

But then I get the results: 244 -201326592

Why is this happening? The return type to "get_le" is uint32_t, and the first function, "get," which is supposed to return the Big-Endian unsigned integer, is performing correctly.

As a side note, this was just a test that popped into my head, so I went to the library to test it in-between classes, but unfortunately that means I have to use an online compiler (onlinegdb), but I figure it would work the same in Visual Studio. Also, if you have any suggestions as to how I could improve my code, it would be greatly appreciated. I am using Visual Studio 2019 and am allowed to use cstdlib.

char has implementation defined signedness. Explicitly use unsigned char if you don't want it to potentially interpret it as signed values. — ShadowRanger
– ShadowRanger, Commented Sep 3, 2020 at 16:23
What else do you use? I hate using cout because I have to put a << between every single value... and I have to use a bunch of rules if I want to add padding. (After testing %x from SuperStormer's comment) Printf allows me to put "%08x" to show the whole hex value the same as the others, also allows me to just put "%.2f" for 2 floating points... like there is a lot to love with printf. — Brandon Woolworth
– Brandon Woolworth, Commented Sep 3, 2020 at 16:32
This is why nobody likes printf anymore. This is a slightly hyperbolas statement. Not everyone hates printf() — ryyker
– ryyker, Commented Sep 3, 2020 at 16:34

Mike Robinson · Accepted Answer · 2020-09-03 16:27:08Z

3

Well, I daresay you want to use %u not %d in that printf() format-string!

%d assumes that the value is signed, so if the most-significant bit is 1 you get a minus sign.

answered Sep 3, 2020 at 16:27

Mike Robinson

9,0516 gold badges33 silver badges50 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Brandon Woolworth Over a year ago

I was not aware of this. I assumed "%d" meant "digit," as in it would output any value in it's intended form. I should probably take another look at printf..

Mark Ransom Over a year ago

@BrandonWoolworth the fatal flaw of printf is that it doesn't know the type of its arguments. You need to be very careful to use the proper matching format codes, including any modification flags.

Jan Schultke · Accepted Answer · 2020-09-03 17:28:41Z

1

There is a more elegant way to accomplish the same task. Just use uint32_t instead. You can use std::memcpy to convert between char arrays and uint32_t without invoking undefined behavior. This is what std::bit_cast does too. Reinterpreting a char* as an int* is undefined behavior. It is not the cause of your problem, because MSVC allows for it, but that's not really portable.

std::memcpy conversions or pointer casts will take place with native byte order, which is either little or big endian. You can convert between byte orders using a builtin function. For MSVC, this would be:

_byteswap_ulong(x); // unsigned long is uint32_t on Windows

See the documentation of _byteswap_ulong. This will compile to just a single x86 bswap instruction, which is unlikely for your series of shifts. This can improve performance by a factor of 10x. GCC and clang have __builtin_bswap if you want portable code.

You can detect native endianness using std::endian or if you don't have C++20, __BYTE_ORDER__ macros. Converting to little-endian or big-endian would then just be doing nothing or performing a byte swap depending on your platform endianness.

#include <bit>
#include <cstring>
#include <cstdint>

uint32_t bswap(uint32_t x) {
    return _byteswap_ulong(x);
}

uint32_t to_be(uint32_t x) {
    return std::endian::native == std::endian::big ? x : bswap(x);
}

uint32_t to_le(uint32_t x) {
    return std::endian::native == std::endian::little ? x : bswap(x);
}

int main() {
    char ary[4] = { 0, 0, 0, (char) 0xF4 };
    uint32_t v;
    std::memcpy(&v, &ary, 4);
    
    printf("%u %u\n", to_be(v), to_le(v));
    return 0;
}

edited Sep 3, 2020 at 17:28

answered Sep 3, 2020 at 17:07

Jan Schultke

43.7k8 gold badges109 silver badges189 bronze badges

1 Comment

Brandon Woolworth Over a year ago

Man, C++20 looks amazing. I keep seeing things that I'd love to use (namely concepts) but unfortunately not enough of them are implemented in MSVC, and I don't want to use C++20 only to learn that I can't use something. That said, I appreciate the comment. I opted not to go this route because I know that I am receiving packets, so they come in a particular order (not sure if I could detect endianness), and I can't use those macros or else I would use the winsock methods (htons, etc.).

Collectives™ on Stack Overflow

Why Doesn't A Character Array Give an Unsigned Result

2 Answers 2

2 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related