0

Suppose I have an array in C with 5 elements which hold integers in the range [0..255], would it be generally better to useunsigned char, unsigned int or int regarding performance? Because a char would be only one byte, but an int is easier to handle for the processor, as far as I know. Or does it mostly depend on how the elements are accessed?

EDIT: Measuring is quite difficult, because the code belongs to a library, and the array is accessed external.

Also, I encounter this problem not only in this very case, so I'm asking for a more general answer

8
  • for 5 elements, go with int. for large sizes, profile. the cost of bitmasking can be more than offset if the whole thing can be made to fit in the cache Commented Aug 9, 2014 at 15:28
  • 1
    That is something which you will have to test for. Commented Aug 9, 2014 at 15:28
  • @sp2danny int because it "handier" (for the processor)? Commented Aug 9, 2014 at 15:35
  • @Kapichu yes, requires no bitmasking Commented Aug 9, 2014 at 21:36
  • @sp2danny what does that exactly mean? Commented Aug 9, 2014 at 22:27

3 Answers 3

1

While the answer really depends on the CPU and how it handles loading storing small integers, you can assume that the byte array will be faster on most modern systems:

A char only takes 1/4 of the space that an int takes (on most systems), which means that working on a char array takes only a quarter of the memory bandwidth. And most codes are memory bound on modern hardware.

Sign up to request clarification or add additional context in comments.

3 Comments

You mean when taking the whole array? Because I've learned that a single int is faster as a single char.
Who told you that? On AMD-64, loading a byte into a register is only one operation, independent of whether it is signed or unsigned, and a sign/zero extension is not the kind of operation that makes a CPU sweat. You can check your architecture by compiling variations of int foo(unsigned char* bar) { return *bar; } with the -Os -S flags and looking at the resulting .s file. I think I remember that the PowerPC required a second instruction for loading a signed char, which adds one cycle to the operation. But a main memory load can take ca. 25 cycles. That's the important figure.
As I said, the "bitmasking" operation is nothing that can make a CPU sweat. Think about the hardware required: you only need to connect those extra bits with mass. That's a few wires, no logic required. A stark contrast to the complexity of adding two 64 bit numbers where the carry out of the first bit may or may not influence the result of the 64th bit. And many modern CPUs can do the later in a single cycle. You can really expect zero overhead of loading a byte into an integer register if there is an instruction for it. For storing, there is never overhead.
0

This one is quite impossible to answer, because it depends on the code, compiler, and processor.

However, one suggestion is to use uint8_t insted of unsigned char. (The code will be the same, but the explicit version conveys the meaning much better.)

Then there is one more thing to consider. You might be best off by packing four 8-bit integers into one 32-bit integer. Most arithmetic and bitwise logical operations work fine as long as there are no overflows (division being the notable exception).

5 Comments

What's the benefit from packing the bytes? In memory, they are the same as if I treated them as seperate bytes
If you, for example, need to multiply all values by 5, you only carry out one multiplication. This is especially useful with bitmasks, but works with a lot of operations as long as there are no overflows. If you are afraid of overflows, you may pack four 8-bit ints into a 64-bit int. This gives you ample space for overflows. (The usefulness of this packing depends highly on the task at hand.)
But packing is slower, isn't it?
Oh, and since the values have nothing to do with each other, packing wouldn't even make sense (they'll never be multiplied together)
As I said, this really depends on the application. There is no single correct answer, as it all depends on where you get the data from, and how you use it. For example, indexing 32-bit values is usually fastest, but if that makes the data overflow from the CPU cache, packed storage may be faster.
0

The golden rule: Measure it. If you can't be bothered measuring it, then it isn't worth optimising. So measure it one way, change it, measure it the other way, take whatever is faster. Be aware that when you switch to a different compiler, or a different processor, like a processor that is introduced in 2015 or 2016, the other code might now be faster.

Alternatively, don't measure it, but write the most readable and maintainable code.

And consider using a single 64 bit integer and shift operations :-)

1 Comment

See my edit, measuring is difficult, since the way of access depends on the user of the lib

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.