8

I mentioned that ARM toolchains could generate different function prologs. Actually, i saw two obj files (vmlinux) with completely different function prologs:

The first case looks like:

push {some registers maybe, fp, lr} (lr ommited in leaf function)

The second case looks like:

push {some registers maybe, fp, sp, lr, pc} (i can confuse the order)

So as i see the second one pushes additionally pc and sp. Also i saw some comments in crash utility (kdump project) where was stated, that kernel stackframe should have format {..., fp, sp, lr, pc} what confuse me more, because i see that in some cases it is not true.

1.) Am i right about that some gcc extra flags are needed for pushing additionally pc and sp in function prolog? If yes what are they?.

2.) What is this used for? Basically, as i understand i can unwind stack with FP and LR only, why do i need this additional values?

3.) If this things dealth nothing with compilation flags - how can i force generation of this extended function prolog and again what is the purpose?

Thank you.

0

1 Answer 1

4

1.) Am i right about that some gcc extra flags are needed for pushing additionally pc and sp in function prolog? If yes what are they?.

There are many gcc options that will affect stack frames (-march, -mtune, etc may affect the instructions used for instance). In your case, it was -mapcs-frame. Also, -fomit-frame-pointer will remove frames from leaf functions. Several static functions maybe merged together into a single generated function further reducing the number of frames. The APCS can cause slightly slower code but is needed for simple stack traces.

2.) What is this used for? Basically, as i understand i can unwind stack with FP and LR only, why do i need this additional values?

All registers that are not parameters (r0-r3) need to be saved as they need to be restored when returning to the caller. The compiler will allocate additional locals on the stack so sp will almost always change when fp changes. For why the pc is stored, see below.

3.) If this things dealth nothing with compilation flags - how can i force generation of this extended function prolog and again what is the purpose?

It is compiler flags as you had guessed.

; Prologue - setup
mov     ip, sp                 ; get a copy of sp.
stm     sp!, {fp, ip, lr, pc}  ; Save the frame on the stack. See Addendum
sub     fp, ip, #4             ; Set the new frame pointer.
    ...
; Epilogue - return
ldm     sp, {fp, sp, lr}       ; restore stack, frame pointer and old link.
    ...                        ; maybe more stuff here.
bx      lr                     ; return.

A typical save is stm sp!, {fp, ip, lr, pc} and a restore of ldm sp, {fp, sp, lr}. This is correct if you examine the ABI/APCS documents. Note, there is no '!' to try and fix the stack. It is loaded explicitly from the stored ip value.

Also, the saved pc is not used in the epilogue. It is just discarded data on the stack. So why do this? Exception handlers (interrupts, signals or C++ exceptions) and other stack trace mechanisms want to know who saved a frame. The ARM always only have one function prologue (one point of entry). However, there are multiple exits. In some cases, a return like return function(); may actually turn into a b function in the maybe more stuff here. This is known as a tail call. Also when a leaf function is called in the middle of a routine and an exception occurs, it will see a PC range of leaf, but the leaf may have no call frame. By saving the pc, the call frame can be examined when an exception occurs in leaf to know who really saved the stack. Tables of pc versus destructor, etc. maybe stored to allow objects to be freed or to figure out how to call a signal handler. The extra pc is just plain nice when tracing a stack and the operation is almost free due to pipe lining.

See also: ARM Link and frame register question for how the compiler uses these registers.

Sign up to request clarification or add additional context in comments.

4 Comments

On some ARM CPUs, the stack must be aligned to 8 bytes, which is another reason to save the 'pc'. If your function calls other functions, it is not known what 'reservations' it needs on the stack (or even if it calls more functions). This is another reason why 'sp' must be saved.
In the AAPCS32 6.2.1.4(github.com/ARM-software/abi-aa/blob/…), it is said that "the highest addressed word shall contain the value passed in LR on entry to the current function" in a stack frame. Doesn't this mean that pushing pc onto the stack violates the AAPCS as the highest addressed byte should be lr?
@palapapa This question refers to APCS, not AAPCS. The AAPCS changes the rules and became more main stream around 2013-2017. Before this ARM compilers use FP to save stack slots. It is now more common to allow it as a general purpose register. In this case, the stack layout is defined by data and the tracing and unwind is more involved. The type of rule you cite is to allow FP as a general purpose register. Notice how in the prologue, the IP is used to set the new FP. With AAPCS, there doesn't need to be a frame pointer. AAPCS is GCC 5.0 and onward as a default.
Structure of ARM extab Q/A give the format of extab and exidx, which are used to unwind (exceptions, stack trace) for the AAPCS. With the APCS (prior ARM ABI), it was much easier to implement tracing. These tables (extab,exidx) are rather large, which is fine for disk, but more expensive for kernels and embedded applications. Unwindiing is a rare event and the extra register can be used by compilers to produce better code. It is easy to get the ABIs confused. Note the extra A, in AAPCS.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.