2

I found this Q&A one the Web:

Q: Which is better a char, short or int type for optimization?

A: Where possible, it is best to avoid using char and short as local variables. For the types char and short the compiler needs to reduce the size of the local variable to 8 or 16 bits after each assignment. This is called sign-extending for signed variables and zeroextending for unsigned variables. It is implemented by shifting the register left by 24 or 16 bits, followed by a signed or unsigned shift right by the same amount, taking two instructions (zero-extension of an unsigned char takes one instruction). These shifts can be avoided by using int and unsigned int for local variables. This is particularly important for calculations which first load data into local variables and then process the data inside the local variables. Even if data is input and output as 8- or 16-bit quantities, it is worth considering processing them as 32-bit quantities.

Is this correct? I thought it is better to avoid char and short because of arithmetic conversion (most likely they will be converted to ints or longs, and this will cause compiler to generate extra instructions).

Q: How to reduce function call overhead in ARM based systems?

A: Avoid functions with a parameter that is passed partially in a register and partially on the stack (split-argument). This is not handled efficiently by the current compilers: all register arguments are pushed on the stack.

· Avoid functions with a variable number of parameters. Varargs functions. ...

Concerning 'varargs' -- is this because the arguments will be passed over the stack? What is a function with args partially passed in registers, and partially via stack, could you provide example?

Can we say, that the way function arguments are passed (either by registers or stack) strongly depends on architecture?

Thanks !

7
  • 5
    If you do such long citations in your question, could you please also refer to the original, such that we know where this comes from? Commented Jan 11, 2011 at 8:02
  • 2
    Downvoted until links are added. Commented Jan 11, 2011 at 12:26
  • 1
    It appears to come from: technology-shettyprasad.blogspot.com/2010/07/… Commented Jan 11, 2011 at 18:10
  • 1
    @Mark please add links to your question. Commented Jan 11, 2011 at 21:03
  • 2
    In addition, asking multiple different things in the same question is really confusing, as the different things require different answers. And please use a meaningful question title, rather than something horribly vague like "embedded programming" (as this was initially titled). Commented Jan 12, 2011 at 0:06

6 Answers 6

7

Simply put: that advice on optimization is misleading. Not necessarily wrong, but incomplete.

It appears your source was CodeProject. He states he's mostly talking about optimization for ARM.

First, it's highly processor-dependent how char and short are handled. Depending on the architecture, conversions may be zero or minimal cost, depending on when and how they occur -- at load time, the type of operation, what instructions can run in parallel and in effect may be free, depending on the rest of the code - for example on the TI DSP c64 architecture, which can run 8 ops per cycle. Typically the most-efficient use will be the "native" integer size, but it also depends on where the data comes from - it may be more efficient to load, modify and store back char/short data than to load and convert to int, modify, and store back as char/short. Or it may not - it depends on the architecture and the operations being performed. The compiler often has a better look at whether to do this for you or not.

Second, in many, many architectures char and short are as fast as int, especially if the calculation avoids implicit conversions to int. Note: this is easy to mess up in C, like "x = y + 1" - that forces conversion up to int (assuming x & y are char or short), but the good thing is that almost all compilers are smart enough to optimize-away the conversion for you. Many other cases of having a local be char/short will cause the compiler to optimize-away any conversions depending on how it's used later. This is helped by the fact that in typical processors, the overflow/wrap-around of a char/short is the same result as calculating it as an int and converting on store (or by simply addressing that register as char/short in a later operation - getting the conversion for 'free').

In their example:

int wordinc (int a)
{ 
   return a + 1;
}
short shortinc (short a)
{ 
    return a + 1;
}
char charinc (char a)
{ 
    return a + 1;
}

In many architectures/compilers, these will run equally fast in practice.

Third, in some architectures char/short are faster than int. Embedded architectures with a natural size of 8 or 16 bits (admittedly not the sort of development you're thinking of nowadays) is an example.

Fourth, though not a big issue generally in modern ram-heavy, huge-cache processor environments, keeping local stack storage size down (assuming the compiler doesn't hoist it to a register) may help improve the efficiency of cache accesses, especially level-1 caches.

On the other side, IF the compiler isn't smart enough to hide it from you, local char/shorts passed as arguments to other functions (especially not file-local 'static' functions) may entail up-conversions to int. Again, per above, the compiler may well be smart enough to hide the conversion.

I do agree with this statement at the start of the site you quote:

Although a number of guidelines are available for C code optimization, there is no substitute for having a thorough knowledge of the compiler and machine for which you are programming.

Sign up to request clarification or add additional context in comments.

Comments

2
  1. Yes, according to the standard almost all computations and comparisons are done with integral types that have at least the width of int. So using smaller types "only" saves space and may on the other hand have an overhead.
  2. Varargs have to use the stack, since the corresponding macros that process these arguments usually just use a pointer to keep track of the actual position of the argument that is processed.

5 Comments

varargs also force a conditional loop to spin until the last arg is detected, which has implications for branch-prediction failures.
Re 2: That is not an actual requirement, although it is the easiest implementation technique. Varargs could be partially register, partially stack based.
@Bart: exactly, this is why I had the "usually" in there.
I'm not really familiar with the standards. Are you referring to the C99 standard that requires computations and comparisons to be done with the width of an int?
@semaj: I think for C89 this is the same. For most operations any operand is converted to an integer of conversion rank at least int (that is int or unsigned accordingly) before the operation is performed. In some cases a clever compiler may deduce that when storing this back to a small type the result would be the same as if it had computed with the small type in the first place. If all operands are unsigned types this might be possible. For signed types something simple as a + b + c must be performed with higher width, because an intermediate result could be undefined otherwise.
1

Concerning 'varargs' -- is this because the arguments will be passed over the stack? What is a function with args partially passed in registers, and partially via stack, could you provide example?

if you have a function like:

int my_func(int v1, int v2)

The compiler can use the internal register of the processor to pass the argument v1, v2 during the function call.

if you have:

int my_func(int v1, int v2, ...., int v10)

The space used by your parameter is too big to use the processor internal register (not enough space), so you use internal register and the stack.

Can we say, that the way function arguments are passed (either by registers or stack) strongly depends on architecture?

Yes, it also strongly depend on the compiler.

Comments

1

I wouldn't think the reduction in size when assigning to 8 or 16 bits would only take place when assigning from a larger value. For example, if a function returns char, why would it need to modify the value at all when assigning to a char? There may be an exception if there were some operations that could only be done with larger variables, but depending on the compiler and processor, I'm not sure this would come up that often.

8 Comments

On the ARM the registers are 32 bit, if loading from memory then a ldrb would pad the upper bits of the register with zeros, but if you wanted it to be a signed char then you need an extra couple of instructions to sign extend. And as you do other operations, signed or unsigned you need to keep clipping it to 8 bits. Where if you were not interested in rolling over at 255 or +127/-128 then you could have had it as an int and cut down on the number of instructions considerably.
for returning a char for example you would need to clip and sign extend, adding a couple of instructions at least.
char myfun ( int a ) { return(a); } becomes and r0,r0,#0xFF, bx lr, where using an int would have been just bx lr.
signed char myfun ( int a ) { return(a); } becomes mov r0,r0,asl #24; mov r0,r0,asr #24; bx lr instead of just bx lr if it had returned an int instead.
@dwelch: What are you basing these assertions on? If a function returns an 8-bit char, then I don't see why there would be any need to clip or sign extend the value when the caller copies to another 8-bit char.
|
1

On some processors, unsigned char is the fastest type. On some, it will be consistently slower than int. On the ARM, an unsigned char which is stored in memory should run the same speed as an int stored in memory, but an unsigned char stored in a register will frequently have to be 'normalized' to the value 0-255 at the cost of an instruction; an unsigned short would have to be 'normalized' to 0-65535 at the cost of two instructions. I would expect that a good compiler could eliminate a lot of unnecessary normalizations either by working with 65536 times the value of interest, or by observing that upper bits aren't going to matter; I don't know to what extent actual compilers do either of those things.

BTW, it's worth noting that while the C standard requires that adding 1 to a 16-bit unsigned integer that holds 65,535 must yield zero (not 65,536), there is no similar requirement for signed integers. A compiler would be free to regard a signed short or signed char as an int when it's held in a register, and as its proper-sized type when stored in memory. Thus, using signed types would avoid the need for extra value-truncation instructions.

Comments

1

It is target and/or compiler dependent. It may also depend on what you want to optimise, memory usage, code space, or execution time.

Regarding ARM function calls, the ARM ABI defines a standard to which most ARM compiler will comply. It is a rather useless answer since you you would not generally implement or call a variadic function unless you actually needed one.

Genarally let the compiler worry about efficient code generation; it is your expert system for the target, and get on with productive work. Worry about optimisation only when you know that it is needed (i.e. when it is shown to be otherwise too slow or too large).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.