20

Found this on /prog/. I actually GDB'd it, and yes, it was truly a recursion. But how did it happen?

// This works on 32-bit x86 Linux with gcc as long as you don't enable optimization.

#include <stdio.h>
#include <stdlib.h>

static void factorial(int in, int *out)
{
  *(&in-1)-=5-5*(1/in);
  *out*=in--;
}

int main(int argc, char **argv) 
{
  int result=1;
  int number=0;

  if (argc!=2) 
    exit(1);

  number=atoi(argv[1]);
  if (number<1)
    exit(2);

  factorial(number, &result);
  printf("%d! = %d\n", number, result);
  return 0;
}


$ ./factorial 3
3! = 6

$ ./factorial 5
5! = 120

3 Answers 3

22

Sweet. ;)

This is extremely non-portable code that works only on x86. What it's doing is changing the return address on the stack so that if in>1, the function returns not to the instruction following the call instruction, but to the call instruction itself. A call instruction on x86 is five bytes (one opcode plus the 4-byte address of the call destination), so five needs to be subtracted from the return address.

This

*(&in-1)-=5-5*(1/in);

is just an obfuscated way of saying

if(in>1)
    *(&in-1)-=5;

And &in-1 is where the return address lives on the stack.

Sign up to request clarification or add additional context in comments.

8 Comments

@Martin B why it is only available on x86?
@Bakudan: You could probably do similar things on other architectures, but this particular "party trick" is specific to a) the x86 calling convention (i.e. where does the return address live) and b) the fact that an x86 call instruction is 5 bytes.
@Martin,...ca u please give me an explaination.i m not able to get the clear scenario...
This code does not even work reliably on x86. A good optimizing compiler will choose its own calling convention for static functions which are never called through function pointers.
Also in-- should be optimized just to in because the value of in is never used after that.
|
8

It's corrupting return addresses on the stack to cause the recursion to happen.

*(&in-1)-=5-5*(1/in);

&in-1 is probably the pushed return address. the rest is just unpleasant magic.

Comments

0

As said before *(&in-1) is the return address, 5 (I guess) is the size of the function (in words?) and it substract both to get the address where the function begins.

EDIT: Since I think the change to the return address is permanent I don't understand why 5 is subtracted while in > 1;

EDIT2: See the comments

The assembler return instruction only jumps to the address that is on the top of the stack.

Due to the calling convention the parameters are pushed to the stack and removed from it by the caller. Since there is only one real call there is no recursion, the parameters and the return address (pointer) are shared among the iterations.

It is more like a loop than a recursive function.

3 Comments

True, this is more like a loop than a real recursion, because no nested stack frames ever get built.
To answer your question about whether the change to the return address is permament... no, it's not, since the return address gets popped back off the stack when factorial returns. The next call instruction from main() then pushes a new, pristine return address onto the stack... so the return address needs to be modified again. All of this goes on until in==1, at which point we actually want to return from factorial() the normal way -- so we don't modify the return address.
So the call is executed again. I did't think that. That explains why 5, it just changes the return point one instruction back. Thank you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.